Introduction, downloads

D: 12 May 2019

Recent version history

What's new?

Coming next

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{,.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-{,b}pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

Basic statistics

--freq

--geno-counts

--missing

--hardy

Linkage disequilibrium

--indep...

Distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

Distance and similarity matrices

Relationship/covariance

--make-rel ['cov'] ['meanimpute'] [{square | square0 | triangle}] [{zs | bin | bin4}]

--make-rel is the primary interface to PLINK's realized relationship matrix and covariance matrix calculator. (See Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A Tool for Genome-wide Complex Trait Analysis for discussion of relationship matrix definition and usage.)

Output formats
The 'square', 'square0', 'triangle', 'bin', and 'bin4' modifiers have the same effects as in PLINK 1.9; and as always, 'zs' requests Zstandard compression. Depending on which of these modifiers are present, the output matrix's file extension is .rel, .rel.bin, or .rel.zst.

Variance-standardization
By default, the sample covariance at each variant is divided by that variant's variance (calculated from observed, or loaded, MAF). (As a consequence, very-low-MAF variants should be filtered out before performing the default computation.) To disable this and calculate a straight covariance matrix, use the 'cov' modifier.

Distributed computation
--make-rel jobs using the 'square0' or 'triangle' output shapes can be subdivided with the --parallel flag.

Other notes:

  • This calculation is not LD-sensitive; if that's a problem, an alternative is Doug Speed et al.'s LDAK software.
  • Dosages are used when available.
  • For multiallelic variants, REF dosages are used.
  • By default, mean-imputation is not performed for missing values, and we generally recommend using dedicated imputation software instead. However, "--pca approx" is based on the relationship matrix with mean-imputed values, and in practice this has been good enough for --pca's usual applications when the missingness rate isn't too high. To force mean-imputation here, add the 'meanimpute' modifier.
  • Special handling of the diagonal is no longer supported.
Exporting to GCTA

--make-grm-list ['cov'] ['meanimpute'] ['zs'] [{id-header | iid-only}]
--make-grm-bin ['cov'] ['meanimpute'] [{id-header | iid-only}]

--make-grm-list and --make-grm-bin perform the same calculation as --make-rel (so the 'cov' and 'meanimpute' modifiers have the same effect), but produce a .grm or .grm.bin-format file for GCTA to process.

These computations can be subdivided with --parallel.

KING-robust kinship estimator

While, with decent MAFs, the relationship matrix computed by --make-rel/--make-grm-list/--make-grm-bin can be used to reliably identify close relations within a single population, Manichaikul et al.'s KING-robust estimator can also be trusted on mixed-population datasets (with one uncommon exception noted below), and doesn't require MAFs at all. Therefore, we have added this computation to PLINK 2, and the relationship-based pruner is now based on KING-robust.

The exception is that KING-robust underestimates kinship when the parents are from very different populations. You may want to have some special handling of this case.

Note that KING kinship coefficients are scaled such that duplicate samples have kinship 0.5, not 1. First-degree relations (parent-child, full siblings) correspond to ~0.25, second-degree relations correspond to ~0.125, etc. It is conventional to use a cutoff of ~0.354 (the geometric mean of 0.5 and 0.25) to screen for monozygotic twins and duplicate samples, ~0.177 to add first-degree relations, etc.

--make-king [{square | square0 | triangle}] [{zs | bin | bin4}]
--make-king-table ['zs'] ['counts'] ['cols='<col. set descriptor]
--king-table-filter <min. kinship coefficient>
--king-table-subset <.kin0 file> [min. kinship coefficient]

--make-king writes KING-robust coefficients in matrix form to plink2.king or plink2.king.bin, while --make-king-table writes them in table form to plink2.kin0.

  • Only autosomes are included in this computation.
  • Pedigree information is currently ignored; the between-family estimator is used for all pairs.
  • For multiallelic variants, REF allele counts are used.
  • --make-king jobs with the 'square0' or 'triangle' output shapes and all --make-king-table jobs can be subdivided with --parallel.

In addition, with --make-king-table,

  • The 'counts' modifier causes counts rather than 0..1 frequencies to be reported in the output columns that support both.
  • --king-table-filter causes only kinship coefficients ≥ the given threshold to be reported.
  • --king-table-subset causes only sample-pairs mentioned in the given .kin0 file (and optionally passing a kinship-coefficient threshold) to be processed. This allows you to start with a screening step which considers all sample pairs but only a small number of variants scattered across the genome, and follow up with accurate kinship-coefficient computations for just the sample pairs identified as possible relations during the screening step.
  • Refer to the file format entry for other output details and optional columns. --make-king-table now covers much of PLINK 1.x --genome's functionality.
Relationship-based pruning

--king-cutoff [.king.bin + .king.id fileset prefix] <threshold>

If used in conjunction with a later calculation (see the order of operations page for details), --king-cutoff excludes one member of each pair of samples with kinship coefficient greater than the given threshold. Alternatively, you can invoke this on its own to write a pruned list of sample IDs to plink2.king.cutoff.in.id, and excluded IDs to plink2.king.cutoff.out.id.

PLINK tries to maximize the final sample size, but this maximum independent set problem is NP-hard, so we use a greedy algorithm which does not guarantee an optimal result. In practice, --king-cutoff does yield a maximum set whenever there aren't too many intertwined close relations, but if you want to try to beat it (or optimize a fancier function that takes the exact kinship-coefficient values into account), use the --make-king and --keep/--remove flags and patch your preferred algorithm in between.

--king-cutoff usually computes kinship coefficients from scratch. However, you can provide a precomputed kinship-coefficient matrix (must be --make-king binary format, triangular shape, either precision ok) as input; this is a time-saver when experimenting with different thresholds.

(The next several pages of documentation are under development.)

Resources >>