Introduction, downloads

D: 12 May 2019

Recent version history

What's new?

Coming next

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{,.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies



'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file



SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition


Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts


Imputation quality


Founder status

Main functions

Data management

















Basic statistics





Linkage disequilibrium


Distance matrices







1000 Genomes phase 3

Output file list

Order of operations


File formats

Basic statistics

Allele frequency

--freq ['zs'] ['counts'] ['cols='<column set descriptor>] ['refbins='<comma-separated bin boundaries> | 'refbins-file='<filename>] ['alt1bins='<comma-separated bin boundaries> | 'alt1bins-file='<filename>] ['bins-only']

--freq normally writes an empirical allele frequency report to plink2.afreq. With the 'counts' modifier, an allele count/dosage report is written to plink2.acount instead.

  • By default, only founders are considered; this can be changed with --nonfounders.
  • chrM dosages are scaled to sum to 2.
  • Phenotype- and category-stratified frequency reports are no longer directly supported. However, you can use --keep-if to filter on a phenotype condition, and --loop-cats to filter on each category in turn.
  • This file is valid input for --read-freq.
  • Refer to the file format entry for output details and optional columns.

--freq can now report histograms summarizing the allele frequency spectrum. When the 'refbins=' modifier is present, its argument is interpreted as a sequence of comma-separated REF frequency/count bin boundaries, and the corresponding histogram is written to plink2.afreq.ref.bins or plink2.acount.ref.bins. Alternatively, when 'refbins-file=' is present, the named file is interpreted as a sequence of whitespace-separated bin boundaries. 'alt1bins='/'alt1bins-file=' use the same syntax, and report ALT1 frequency/count histograms to plink2.afreq.alt1.bins or plink2.acount.alt1.bins.

Genotype hardcall counts

--geno-counts ['zs'] ['cols='<column set descriptor>]

--geno-counts writes a genotype hardcall count report to plink2.gcount; refer to the file format entry for output details and optional columns. (Note that unlike --freq, this report is not restricted to founders, unless you explicitly request that with e.g. --keep-founders.)

Missing data

--missing ['zs'] [{sample-only | variant-only}] ['scols='<col. set descriptor>] ['vcols='<col. set descriptor>]

--missing produces sample-based and variant-based missing data reports (or just one of these reports, with ('sample-only'/'variant-only').

  • This report is not restricted to founders.
  • Optional column sets support viewing of het. haploid (including mixed MT) counts; refer to the file format entries for more details.
Hardy-Weinberg equilibrium

--hardy ['zs'] ['midp'] ['cols='<col. set descriptor]

--hardy writes autosomal Hardy-Weinberg equilibrium exact test statistics to plink2.hardy, and/or chrX test statistics to plink2.hardy.x. The latter report is based on the computation described in Graffelman J, Weir BS (2016) Testing for Hardy-Weinberg equilibrium at biallelic genetic markers on the X chromosome.

  • By default, only founders are considered; this can be changed with --nonfounders.
  • For variants with k alleles where k>2, k separate 'biallelic' tests are performed, each reported on its own line. However, biallelic variants are normally reported on a single line, since the counts/frequencies would be mirror-images and the p-values would be the same. You can add the 'redundant' modifier to force biallelic variant results to be reported on two lines for parsing convenience.
  • With the 'midp' modifier, a mid-p adjustment is applied (see --hwe for discussion).
  • Since multiple case/control phenotypes can now be loaded simultaneously, this no longer automatically computes separate statistics for just controls or just cases. Call this with e.g. --keep-if to report phenotype-stratified stats.
  • Refer to the file format entries for output details and optional columns.

Linkage disequilibrium >>