Introduction, downloads

D: 18 Apr 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

(--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Report postprocessing

LD-based result clumping

--clump ['zs'] ['cols='<col set desc.>] <PLINK report filename(s)...>

--clump-p1 <index variant p-value threshold>
--clump-p2 <SP2 column p-value threshold>
--clump-r2 <r^2 threshold>
--clump-kb <clump kb radius>

--clump-unphased
--clump-log10 ['input-only' | 'output-only']
--clump-log10-p1 <-log10(index variant p-value threshold)>
--clump-log10-p2 <-log10(SP2 column p-value threshold)>
--clump-bins <p-value bin boundaries...>

--clump-id-field <field name(s)...>
(alias: --clump-snp-field)
--clump-p-field <field name(s)...>
(alias: --clump-field)

--clump-a1-field [field name(s)...]
--clump-test-field [field name(s)...]
--clump-force-a1
--clump-test <test name(s)...>

--clump-allow-overlap

When there are multiple significant association p-values in the same region, LD should be taken into account when interpreting the results. The --clump command is designed to help with this.

--clump loads the named PLINK-format association report(s) (text files with a header line, a column containing variant IDs, and another column containing p-values) and groups results into LD-based clumps, writing a new report to plink2.clumps[.zst]. Multiple filenames can be separated by spaces or commas.

Clumps are formed around central "index variants" which, by default, must have p-value no larger than 0.0001; change this threshold with --clump-p1. Index variants are chosen greedily starting with the lowest p-value. Variants which meet the --clump-p1 threshold, but have already been assigned to another clump, do not start their own clumps.
Sites which are less than 250 kb away from an index variant and have r² larger than 0.5 with it are assigned to that index variant's clump (unless they have been previously been assigned to another clump, and --clump-allow-overlap is not in effect). These two thresholds can be changed with --clump-kb and --clump-r2, respectively.
By default, the r² values computed by --clump are haplotype-based; maximum likelihood haplotype frequency estimates are applied to unphased data. Use --clump-unphased to change this to unphased r²; the resulting correlation coefficients are less accurate measures of LD, but they are more accurate measures of --glm genotype-column similarity (since --glm also doesn't use phase information).
When dosages are present, they are now used in the r² computation.
As usual, only founders are considered in the r² computation. If your dataset has a shortage of them, --make-founders may come in handy.
By default, a p-value histogram is given for each clump, with default bin boundaries 0.0001,0.001,0.01,0.05. You can control the bin boundaries with --clump-bins; provide a comma- or space-separated sequence of increasing numbers.
Sites within the clump which have association p-value smaller than 0.01 are listed in the 'SP2' column of the main report. This threshold can be adjusted with --clump-p2.
By default, variant IDs are expected to be in the 'ID' column, or if that's absent, 'SNP'. You can change this with the --clump-id-field flag, which takes a space-delimited sequence of field names to search for. With multiple field names, earlier names take precedence over later ones. (The other --clump-...-field flags work the same way.)
--clump-log10 specifies -log10(p) rather than raw p-value input/output. The 'input-only' and 'output-only' modifiers let you convert from one format to the other.
By default, p-values are expected to be in the 'P' column (or, with --clump-log10, 'LOG10_P' and 'NEG_LOG10_P' are also recognized); change this with --clump-p-field.
Multiallelic variants are effectively split in this computation. This requires the input file(s) to contain an effect-allele column; by default, this is expected to be 'A1', but you can change this with --clump-a1-field. A1 alleles aren't normally checked or reported for biallelic variants (since they usually don't affect p-values), but you can change that with --clump-force-a1.
Entries in the SP2 column are now of the form <variant ID>[(A1 allele)][(file number)]. The A1 component normally only appears for multiallelic variants; --clump-force-a1 changes this. The file-number component now only defaults to appearing when more than one input file is provided; this can be changed with "cols=+f".
By default, if there is a 'TEST' column, only lines where the test value is 'ADD' are considered. (This is a change from PLINK 1.x.) These default values can be changed with --clump-test-field and --clump-test, respectively.
By default, no variant may belong to more than one clump; remove this restriction with --clump-allow-overlap.
When variant IDs with p ≤ --clump-p1 threshold are present in a --clump input file, but missing from the main dataset, they are now written to plink2.clumps.missing_id[.zst]. When (variant ID, A1 allele) pairs with with p ≤ --clump-p1 threshold in the --clump input file are ignored due to the A1 allele being missing from the main dataset (note that A1 is only checked for multiallelic variants and --clump-force-a1), they are written to plink2.clumps.missing_allele[.zst].
We have provisionally retired --clump's other bells and whistles; contact us if this is a problem.

The PLINK 1.07 documentation has more discussion of these flags, including a few detailed examples.

Linear scoring >>