Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Identity-by-descent

These calculations are not LD-aware. It is usually a good idea to perform some form of LD-based pruning before invoking them.

--genome ['gz'] ['rel-check'] ['full'] ['unbounded'] ['nudge']
--ppc-gap <distance in kbs>
--min <minimum PI_HAT value>
--max <maximum PI_HAT value>

--genome invokes an IBS/IBD computation, and then writes a report with the following fields to plink.genome:

FID1Family ID for first sample
IID1Individual ID for first sample
FID2Family ID for second sample
IID2Individual ID for second sample
RTRelationship type inferred from .fam/.ped file
EZIBD sharing expected value, based on just .fam/.ped relationship
Z0P(IBD=0)
Z1P(IBD=1)
Z2P(IBD=2)
PI_HATProportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)
PHEPairwise phenotypic code (1, 0, -1 = AA, AU, and UU pairs, respectively)
DSTIBS distance, i.e. (IBS2 + 0.5*IBS1) / (IBS0 + IBS1 + IBS2)
PPCIBS binomial test
RATIOHETHET : IBS0 SNP ratio (expected value 2)

Note that there is one entry per pair of samples, so this file can be very large. The 'gz' modifier causes the output to be gzipped, while 'rel-check' removes pairs of samples with different FIDs, and --min/--max removes lines with PI_HAT values below/above the given cutoff(s).

The 'full' modifier causes the following fields to be added:

IBS0Number of IBS 0 nonmissing variants
IBS1Number of IBS 1 nonmissing variants
IBS2Number of IBS 2 nonmissing variants
HOMHOMNumber of IBS 0 SNP pairs used in PPC test
HETHETNumber of IBS 2 het/het SNP pairs used in PPC test

By default, the minimum distance between informative pairs of SNPs used in the pairwise population concordance (PPC) test is 500k base pairs; you can change this with the --ppc-gap flag.

The underlying P(IBD=0/1/2) estimator sometimes yields numbers outside the range [0,1]; by default, these are clipped. The 'unbounded' modifier turns off this clipping. Then, if PI_HAT2 < P(IBD=2), 'nudge' adjusts the final estimates to P(IBD=0) := (1-p2), P(IBD=1) := 2p(1-p), and P(IBD=2) := p2, where p is the current PI_HAT.

This estimator requires fairly accurate minor allele frequencies to work properly. Use --read-freq if you do not think your immediate dataset's empirical MAFs are representative.

--genome jobs can be subdivided with --parallel, which is substantially easier to use than PLINK 1.07 --genome-lists. (Since we are not aware of other practical applications of --genome-lists, that flag has been provisionally retired; contact us if you still need it.)

We may add more sophisticated IBD estimation routine(s) in the future if there is sufficient interest.

Runs of homozygosity

--homozyg [{group | group-verbose}] ['consensus-match'] ['extend'] ['subtract-1-from-lengths']
--homozyg-snp <min SNP count>
--homozyg-kb <min length>
--homozyg-density <max inverse density (kb/SNP)>
--homozyg-gap <max internal gap kb length>

--homozyg-het <max hets>

--homozyg-window-snp <scanning window size>
--homozyg-window-het <max hets in scanning window hit>
--homozyg-window-missing <max missing calls in scanning window hit>
--homozyg-window-threshold <min scanning window hit rate>

If any of these flags are present, a set of run-of-homozygosity reports is generated using PLINK 1.07's scanning algorithm. See the original documentation for more details.

  • You may also want to try 'bcftools roh', which uses a HMM-based detection method. (We'll include a basic port of that command in PLINK 2.0 if there is sufficient interest.)
  • If you're satisfied with all the default settings described below, just use --homozyg with no modifiers. Otherwise, --homozyg lets you change a few binary settings:
    • The 'group[-verbose]' modifier adds a report on pools of overlapping runs of homozygosity. (This is triggered by --homozyg-match as well.) 'group-verbose' also produces a detailed report for each pool.
    • With 'group[-verbose]', 'consensus-match' causes pairwise segmental matches to be called based only on the SNPs in the entire pool's consensus segment, rather than all the SNPs in the pairwise intersection.
    • Due to how the scanning algorithm works, it is possible for a reported run of homozygosity to be adjacent to a few unincluded homozygous variants. This is generally harmless, but if you wish to extend the ROH to include them, use the 'extend' modifier. (Note that the --homozyg-density bound can prevent extension, and --homozyg-gap affects which variants are considered adjacent.)
    • By default, segment bp lengths are calculated as (<end bp position> - <start bp position> + 1). This is a minor change from PLINK 1.07, which does not add 1 at the end. For testing purposes, you can use the 'subtract-1-from-lengths' modifier to apply the old formula.
  • By default, only runs of homozygosity containing at least 100 SNPs, and of total length ≥ 1000 kilobases, are noted. You can change these minimums with --homozyg-snp and --homozyg-kb, respectively.
  • By default, a ROH must have at least one SNP per 50 kb on average; change this bound with --homozyg-density.
  • By default, if two consecutive SNPs are more than 1000 kb apart, they cannot be in the same ROH; change this bound with --homozyg-gap.
  • By default, a ROH can contain an unlimited number of heterozygous calls; you can impose a limit with --homozyg-het. (This flag was silently ignored by PLINK 1.07.)
  • By default, the scanning window contains 50 SNPs; change this with --homozyg-window-snp.
  • By default, a scanning window hit can contain at most 1 heterozygous call and 5 missing calls; change these limits with --homozyg-window-het and --homozyg-window-missing, respectively.
  • By default, for a SNP to be eligible for inclusion in a ROH, the hit rate of all scanning windows containing the SNP must be at least 0.05; change this threshold with --homozyg-window-threshold.

--homozyg-match <min overlap rate>
--pool-size <min pool size>

In a "--homozyg group[-verbose]" run, pools of overlapping ROH are formed, then pairwise allelic matches within each pool are identified, then allelic-match groups are formed based on these matches. (More precisely, each group has a reference member marked with an appended '*' in the .hom.overlap 'GRP' column, and all other members of the group have pairwise allelic matches with the reference member.) By default, a pairwise match is defined as 0.95 or greater concordance between segments across jointly homozygous variants; you can change this threshold with --homozyg-match.

--pool-size excludes all pools with fewer than the given number of segments from the report(s).

Population stratification >>