Introduction, downloads

D: 5 Feb 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Report postprocessing

LD-based result clumping

--clump ['zs'] ['cols='<col set desc.>] <PLINK report filename(s)...>

--clump-p1 <index variant p-value threshold>
--clump-p2 <SP2 column p-value threshold>
--clump-r2 <r^2 threshold>
--clump-kb <clump kb radius>

--clump-unphased
--clump-log10 ['input-only' | 'output-only']
--clump-log10-p1 <-log10(index variant p-value threshold)>
--clump-log10-p2 <-log10(SP2 column p-value threshold)>
--clump-bins <p-value bin boundaries...>

--clump-id-field <field name(s)...>
  (alias: --clump-snp-field)
--clump-p-field <field name(s)...>
  (alias: --clump-field)

--clump-a1-field [field name(s)...]
--clump-test-field [field name(s)...]
--clump-force-a1
--clump-test <test name(s)...>

--clump-allow-overlap

When there are multiple significant association p-values in the same region, LD should be taken into account when interpreting the results. The --clump command is designed to help with this.

--clump loads the named PLINK-format association report(s) (text files with a header line, a column containing variant IDs, and another column containing p-values) and groups results into LD-based clumps, writing a new report to plink2.clumps[.zst]. Multiple filenames can be separated by spaces or commas.

  • Clumps are formed around central "index variants" which, by default, must have p-value no larger than 0.0001; change this threshold with --clump-p1. Index variants are chosen greedily starting with the lowest p-value. Variants which meet the --clump-p1 threshold, but have already been assigned to another clump, do not start their own clumps.
  • Sites which are less than 250 kb away from an index variant and have r2 larger than 0.5 with it are assigned to that index variant's clump (unless they have been previously been assigned to another clump, and --clump-allow-overlap is not in effect). These two thresholds can be changed with --clump-kb and --clump-r2, respectively.
  • By default, the r2 values computed by --clump are haplotype-based; maximum likelihood haplotype frequency estimates are applied to unphased data. Use --clump-unphased to change this to unphased r2; the resulting correlation coefficients are less accurate measures of LD, but they are more accurate measures of --glm genotype-column similarity (since --glm also doesn't use phase information).
  • When dosages are present, they are now used in the r2 computation.
  • As usual, only founders are considered in the r2 computation. If your dataset has a shortage of them, --make-founders may come in handy.
  • By default, a p-value histogram is given for each clump, with default bin boundaries 0.0001,0.001,0.01,0.05. You can control the bin boundaries with --clump-bins; provide a comma- or space-separated sequence of increasing numbers.
  • Sites within the clump which have association p-value smaller than 0.01 are listed in the 'SP2' column of the main report. This threshold can be adjusted with --clump-p2.
  • By default, variant IDs are expected to be in the 'ID' column, or if that's absent, 'SNP'. You can change this with the --clump-id-field flag, which takes a space-delimited sequence of field names to search for. With multiple field names, earlier names take precedence over later ones. (The other --clump-...-field flags work the same way.)
  • --clump-log10 specifies -log10(p) rather than raw p-value input/output. The 'input-only' and 'output-only' modifiers let you convert from one format to the other.
  • By default, p-values are expected to be in the 'P' column (or, with --clump-log10, 'LOG10_P' and 'NEG_LOG10_P' are also recognized); change this with --clump-p-field.
  • Multiallelic variants are effectively split in this computation. This requires the input file(s) to contain an effect-allele column; by default, this is expected to be 'A1', but you can change this with --clump-a1-field. A1 alleles aren't normally checked or reported for biallelic variants (since they usually don't affect p-values), but you can change that with --clump-force-a1.
  • Entries in the SP2 column are now of the form <variant ID>[(A1 allele)][(file number)]. The A1 component normally only appears for multiallelic variants; --clump-force-a1 changes this. The file-number component now only defaults to appearing when more than one input file is provided; this can be changed with "cols=+f".
  • By default, if there is a 'TEST' column, only lines where the test value is 'ADD' are considered. (This is a change from PLINK 1.x.) These default values can be changed with --clump-test-field and --clump-test, respectively.
  • By default, no variant may belong to more than one clump; remove this restriction with --clump-allow-overlap.
  • When variant IDs with p ≤ --clump-p1 threshold are present in a --clump input file, but missing from the main dataset, they are now written to plink2.clumps.missing_id[.zst]. When (variant ID, A1 allele) pairs with with p ≤ --clump-p1 threshold in the --clump input file are ignored due to the A1 allele being missing from the main dataset (note that A1 is only checked for multiallelic variants and --clump-force-a1), they are written to plink2.clumps.missing_allele[.zst].
  • We have provisionally retired --clump's other bells and whistles; contact us if this is a problem.

The PLINK 1.07 documentation has more discussion of these flags, including a few detailed examples.

Linear scoring >>