Introduction, downloads

D: 7 Jul 2025

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PROVISIONAL_REF?

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

EIGENSOFT binary

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Same-indiv selection

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Mendel errors

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-me-missing

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--mendel

--het

--check-sex/--impute-sex

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

(--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Discussion forums

Credits

File formats

Tutorials

Setup

Rules of Thumb

Data Exploration 1 — HWE, Allele Frequency Spectrum

Data Exploration 2 — Genomic Structure

Linkage

Relationship Matrix

Genome-Wide Assocation Analyses (GWAS)

Regressions

bcftools

Quick index search

Report postprocessing

LD-based result clumping

--clump ['zs'] ['cols='<col set desc.>] <PLINK report filename(s)...>

--clump-p1 <index variant p-value threshold>
--clump-p2 <SP2 column p-value threshold>
--clump-r2 <r^2 threshold>
--clump-kb <clump kb radius>

--clump-unphased
--clump-log10 ['input-only' | 'output-only']
--clump-log10-p1 <-log10(index variant p-value threshold)>
--clump-log10-p2 <-log10(SP2 column p-value threshold)>
--clump-bins <p-value bin boundaries...>

--clump-id-field <field name(s)...>
(alias: --clump-snp-field)
--clump-p-field <field name(s)...>
(alias: --clump-field)

--clump-a1-field [field name(s)...]
--clump-test-field [field name(s)...]
--clump-force-a1
--clump-test <test name(s)...>

--clump-allow-overlap

--clump-range <filename>

--clump-range0 <filename>

--clump-range-border <kbs>

When there are multiple significant association p-values in the same region, LD should be taken into account when interpreting the results. The --clump command is designed to help with this.

--clump loads the named PLINK-format association report(s) (text files with a header line, a column containing variant IDs, and another column containing p-values) and groups results into LD-based clumps, writing a new report to plink2.clumps[.zst]. Multiple filenames can be separated by spaces or commas.

Clumps are formed around central "index variants" which, by default, must have p-value no larger than 0.0001; change this threshold with --clump-p1. Index variants are chosen greedily starting with the lowest p-value. Variants which meet the --clump-p1 threshold, but have already been assigned to another clump, do not start their own clumps.
Sites which are less than 250 kb away from an index variant and have r² larger than 0.5 with it are assigned to that index variant's clump (unless they have been previously been assigned to another clump, and --clump-allow-overlap is not in effect). These two thresholds can be changed with --clump-kb and --clump-r2, respectively.
By default, the r² values computed by --clump are haplotype-based; maximum likelihood haplotype frequency estimates are applied to unphased data. Use --clump-unphased to change this to unphased r²; the resulting correlation coefficients are less accurate measures of LD, but they are more accurate measures of --glm genotype-column similarity (since --glm also doesn't use phase information).
When dosages are present, they are now used in the r² computation.
As usual, only founders are considered in the r² computation. If your dataset has a shortage of them, --make-founders may come in handy.
By default, a p-value histogram is given for each clump, with default bin boundaries 0.0001,0.001,0.01,0.05. You can control the bin boundaries with --clump-bins; provide a comma- or space-separated sequence of increasing numbers.
Sites within the clump which have association p-value smaller than 0.01 are listed in the 'SP2' column of the main report, and their span is used for --clump-range[0]. This threshold can be adjusted with --clump-p2.
Given a gene region file with 1-based coordinates, --clump-range causes overlaps between regions and clumps to be reported. --clump-range0 does the same with 0-based input coordinates.

Overlaps are now reported in the main .clumps[.zst] file; this is a change from PLINK 1.x.
With either flag, --clump-range-border extends each region's bounds by the given number of kilobases.

By default, variant IDs are expected to be in the 'ID' column, or if that's absent, 'SNP'. You can change this with the --clump-id-field flag, which takes a space-delimited sequence of field names to search for. With multiple field names, earlier names take precedence over later ones. (The other --clump-...-field flags work the same way.)
--clump-log10 specifies -log10(p) rather than raw p-value input/output. The 'input-only' and 'output-only' modifiers let you convert from one format to the other.
By default, p-values are expected to be in the 'P' column (or, with --clump-log10, 'LOG10_P' and 'NEG_LOG10_P' are also recognized); change this with --clump-p-field.
Multiallelic variants are effectively split in this computation. This requires the input file(s) to contain an effect-allele column; by default, this is expected to be 'A1', but you can change this with --clump-a1-field. A1 alleles aren't normally checked or reported for biallelic variants (since they usually don't affect p-values), but you can change that with --clump-force-a1.
Entries in the SP2 column are now of the form <variant ID>[(A1 allele)][(file number)]. The A1 component normally only appears for multiallelic variants; --clump-force-a1 changes this. The file-number component now only defaults to appearing when more than one input file is provided; this can be changed with "cols=+f".
By default, if there is a 'TEST' column, only lines where the test value is 'ADD' are considered. (This is a change from PLINK 1.x.) These default values can be changed with --clump-test-field and --clump-test, respectively.
By default, no variant may belong to more than one clump; remove this restriction with --clump-allow-overlap.
When variant IDs with p ≤ --clump-p1 threshold are present in a --clump input file, but missing from the main dataset, they are now written to plink2.clumps.missing_id[.zst]. When (variant ID, A1 allele) pairs with with p ≤ --clump-p1 threshold in the --clump input file are ignored due to the A1 allele being missing from the main dataset (note that A1 is only checked for multiallelic variants and --clump-force-a1), they are written to plink2.clumps.missing_allele[.zst].
We have provisionally retired --clump's other bells and whistles; contact us if this is a problem.

The PLINK 1.07 documentation has more discussion of these flags, including a few detailed examples.

Linear scoring >>