Introduction, downloads

D: 27 Jul 2020

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Linkage disequilibrium

--indep...

--ld

Sample comparison

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--adjust-file

Linear scoring

--score

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

Errors and warnings

Output file list

Order of operations

Google groups

Credits

File formats

Quick index search

Linkage disequilibrium

All of the following calculations only consider founders. If your dataset has a shortage of them, PLINK 1.9 --make-founders may come in handy.

Since two-variant r2 only makes sense for biallelic variants, these collapse multiallelic variants down to most common allele vs. the rest.

Variant pruning

--indep-pairwise <window size>['kb'] [step size (variant ct)]
                 <unphased-hardcall-r^2 threshold>
--indep-pairphase <window size>['kb'] [step size (variant ct)]
                  <phased-r^2 threshold>
--indep <window size>['kb'] [step size (variant ct)] <VIF threshold>

These commands produce a pruned subset of variants that are in approximate linkage equilibrium with each other, writing the IDs to plink2.prune.in (and the IDs of all excluded variants to plink2.prune.out). These files are valid input for --extract/--exclude in a future PLINK run; and, for backward compatibility, they do not affect the set of variants in the current run.

Since the only output of these commands is a pair of variant-ID lists, they now error out when variant IDs are not unique.

--indep-pairwise is the simplest approach, which only considers correlations between unphased-hardcall allele counts. It takes three parameters: a required window size in variant count or kilobase (if the 'kb' modifier is present) units, an optional variant count to shift the window at the end of each step (default 1, and now required to be 1 when a kilobase window is used), and a required r2 threshold. At each step, pairs of variants in the current window with squared correlation greater than the threshold are noted, and variants are greedily pruned from the window until no such pairs remain.

LD statistic reports

--ld <variant ID> <variant ID> ['dosage'] ['hwe-midp']

To inspect the relation between a single pair of variants in more detail, you can use the --ld flag, which displays observed and expected (based on MAFs) frequencies of each haplotype, as well as haplotype-based r2 and D'.

  • By default, only hardcalls are considered in this computation; add the 'dosage' modifier to change this.
  • When unphased calls are present, and there are multiple biologically possible solutions to the haplotype frequency cubic equation, all are displayed (instead of just the maximum likelihood solution identified by --r/--r2), along with HWE exact test statistics.

--bad-ld

PLINK 2 cannot estimate LD effectively when very few founders are present, so it normally errors out when there are less than 50. If you can't solve the problem with PLINK 1.9 --make-founders, you can use --bad-ld as a last resort to force PLINK 2 to proceed.

Pairwise sample comparison >>