Introduction, downloads

D: 12 Nov 2019

Recent version history

What's new?

Coming next

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf[.gz])

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-fcol

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--hardy

Linkage disequilibrium

--indep...

Sample comparison

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

Sample-distance and similarity matrices

Relationship/covariance

--make-rel ['cov'] ['meanimpute'] [{square | square0 | triangle}]
           [{zs | bin | bin4}]

--make-rel is the primary interface to PLINK's realized relationship matrix and covariance matrix calculator. (See Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A Tool for Genome-wide Complex Trait Analysis for discussion of relationship matrix definition and usage.)

Output formats
The 'square', 'square0', 'triangle', 'bin', and 'bin4' modifiers have the same effects as in PLINK 1.9; and as always, 'zs' requests Zstandard compression. Depending on which of these modifiers are present, the output matrix's file extension is .rel, .rel.bin, or .rel.zst.

Variance-standardization
By default, the sample covariance at each variant is divided by that variant's variance (calculated from observed, or loaded, MAF). (As a consequence, very-low-MAF variants should be filtered out before performing the default computation.) To disable this and calculate a straight covariance matrix, use the 'cov' modifier.

Distributed computation
--make-rel jobs using the 'square0' or 'triangle' output shapes can be subdivided with the --parallel flag.

Other notes:

  • This calculation is not LD-sensitive; if that's a problem, an alternative is Doug Speed et al.'s LDAK software.
  • Dosages are used when available.
  • For multiallelic variants, REF dosages are used.
  • By default, mean-imputation is not performed for missing values, and we generally recommend using dedicated imputation software instead. However, "--pca approx" is based on the relationship matrix with mean-imputed values, and in practice this has been good enough for --pca's usual applications when the missingness rate isn't too high. To force mean-imputation here, add the 'meanimpute' modifier.
  • Special handling of the diagonal is no longer supported.
Exporting to GCTA

--make-grm-list ['cov'] ['meanimpute'] ['zs'] [{id-header | iid-only}]
--make-grm-bin ['cov'] ['meanimpute'] [{id-header | iid-only}]

--make-grm-list and --make-grm-bin perform the same calculation as --make-rel (so the 'cov' and 'meanimpute' modifiers have the same effect), but produce a .grm or .grm.bin-format file for GCTA to process.

These computations can be subdivided with --parallel.

KING-robust kinship estimator

The relationship matrix computed by --make-rel/--make-grm-list/--make-grm-bin can be used to reliably identify close relations within a single population, if your MAFs are decent. However, Manichaikul et al.'s KING-robust estimator can also be mostly trusted on mixed-population datasets (with one uncommon exception noted below), and doesn't require MAFs at all. Therefore, we have added this computation to PLINK 2, and the relationship-based pruner is now based on KING-robust.

The exception is that KING-robust underestimates kinship when the parents are from very different populations. You may want to have some special handling of this case; --pca can help detect it.

Note that KING kinship coefficients are scaled such that duplicate samples have kinship 0.5, not 1. First-degree relations (parent-child, full siblings) correspond to ~0.25, second-degree relations correspond to ~0.125, etc. It is conventional to use a cutoff of ~0.354 (the geometric mean of 0.5 and 0.25) to screen for monozygotic twins and duplicate samples, ~0.177 to add first-degree relations, etc.

--make-king [{square | square0 | triangle}] [{zs | bin | bin4}]
--make-king-table ['zs'] ['counts'] ['rel-check'] ['cols='<col. set descrip.>]
--king-table-filter <min. kinship coefficient>
--king-table-subset <.kin0 file> [min. kinship coefficient]

--make-king writes KING-robust coefficients in matrix form to plink2.king or plink2.king.bin, while --make-king-table writes them in table form to plink2.kin0.

  • Only autosomes are included in this computation.
  • Pedigree information is currently ignored; the between-family estimator is used for all pairs.
  • For multiallelic variants, REF allele counts are used.
  • --make-king jobs with the 'square0' or 'triangle' output shapes and all --make-king-table jobs can be subdivided with --parallel.

In addition, with --make-king-table,

  • The 'counts' modifier causes counts rather than 0..1 frequencies to be reported in the output columns that support both.
  • The 'rel-check' modifier causes only same-FID pairs to be reported. (The between-family KING estimator is still used.)
  • --king-table-filter causes only kinship coefficients ≥ the given threshold to be reported.
  • --king-table-subset causes only sample-pairs mentioned in the given .kin0 file (and optionally passing a kinship-coefficient threshold) to be processed. This allows you to start with a screening step which considers all sample pairs but only a small number of variants scattered across the genome (try --maf + --bp-space), and follow up with accurate kinship-coefficient computations for just the sample pairs identified as possible relations during the screening step. (This two-step approach remains practical with millions of samples!)
  • Refer to the file format entry for other output details and optional columns. --make-king-table now covers much of PLINK 1.x --genome's functionality.

See also the original KING software package, which has some useful two-step workflows directly built in, along with handy additional features like pedigree inference.

Relationship-based pruning

--king-cutoff [.king.bin + .king.id fileset prefix] <threshold>

If used in conjunction with a later calculation (see the order of operations page for details), --king-cutoff excludes one member of each pair of samples with kinship coefficient greater than the given threshold. Alternatively, you can invoke this on its own to write a pruned list of sample IDs to plink2.king.cutoff.in.id, and excluded IDs to plink2.king.cutoff.out.id.

PLINK tries to maximize the final sample size, but this maximum independent set problem is NP-hard, so we use a greedy algorithm which does not guarantee an optimal result. In practice, --king-cutoff does yield a maximum set whenever there aren't too many intertwined close relations, but if you want to try to beat it (or optimize a fancier function that takes the exact kinship-coefficient values into account), use the --make-king and --keep/--remove flags and patch your preferred algorithm in between.

--king-cutoff usually computes kinship coefficients from scratch. However, you can provide a precomputed kinship-coefficient matrix (must be --make-king binary format, triangular shape, either precision ok) as input; this is a time-saver when experimenting with different thresholds.

(The next several pages of documentation are under development.)

Resources >>