Introduction, downloads

D: 12 Nov 2019

Recent version history

What's new?

Coming next

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf[.gz])

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-fcol

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--hardy

Linkage disequilibrium

--indep...

Sample comparison

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

Basic statistics

Allele frequency

--freq ['zs'] ['counts'] ['cols='<column set descriptor>]
       ['refbins='<comma-separated bin boundaries> | 'refbins-file='<filename>]
       ['alt1bins='<comma-separated bin bounds> | 'alt1bins-file='<filename>]
       ['bins-only']

--freq normally writes an empirical allele frequency report to plink2.afreq. With the 'counts' modifier, an allele count/dosage report is written to plink2.acount instead.

  • By default, only founders are considered; this can be changed with --nonfounders.
  • Phenotype- and category-stratified frequency reports are no longer directly supported. However, you can use --keep-if to filter on a phenotype condition, and --loop-cats to filter on each category in turn.
  • This file is valid input for --read-freq.
  • Refer to the file format entry for output details and optional columns.

--freq can now report histograms summarizing the allele frequency spectrum. When the 'refbins=' modifier is present, its argument is interpreted as a sequence of comma-separated REF frequency/count bin boundaries, and the corresponding histogram is written to plink2.afreq.ref.bins or plink2.acount.ref.bins. Alternatively, when 'refbins-file=' is present, the named file is interpreted as a sequence of whitespace-separated bin boundaries. 'alt1bins='/'alt1bins-file=' use the same syntax, and report ALT1 frequency/count histograms to plink2.afreq.alt1.bins or plink2.acount.alt1.bins.

Genotype hardcall counts

--geno-counts ['zs'] ['cols='<column set descriptor>]

--geno-counts writes a genotype hardcall count report to plink2.gcount; refer to the file format entry for output details and optional columns. (Note that unlike --freq, this report is not restricted to founders, unless you explicitly request that with e.g. --keep-founders.)

Sample variant-counts

--sample-counts ['zs'] ['cols'=<column set descriptor>]

--sample-counts reports the number of observed variants (relative to the reference genome) per sample, subdivided into various classes.

  • This is a highly optimized implementation of the "Per-sample counts" report added by the -s flag to "bcftools stats". If your variants have been left-normalized and split, and your single-letter allele codes are restricted to {A, C, G, T, a, c, g, t}, the SNP counts reported by PLINK 2.0 and bcftools should be identical.
  • Homozygous-ALT genotypes only count as 1 variant, for consistency with bcftools.
  • To keep non-reference, non-missing counts constant through variant splits and joins, we count heterozygous ALTx/ALTy genotypes as 2 variants. This is an intentional change from bcftools.
  • Unknown-sex samples are treated as female.
  • Heterozygous haploid calls (MT included) are treated as missing.
  • As with other commands, SNPs that have not been left-normalized are counted as non-SNP non-symbolic.
  • Refer to the file format entry for output details and optional columns.
Missing data

--missing ['zs'] [{sample-only | variant-only}] ['scols='<col. set descriptor>]
          ['vcols='<col. set descriptor>]

--missing produces sample-based and variant-based missing data reports (or just one of these reports, with ('sample-only'/'variant-only').

  • This report is not restricted to founders.
  • Optional column sets support viewing of het. haploid (including mixed MT) counts; refer to the file format entries for more details.
Hardy-Weinberg equilibrium

--hardy ['zs'] ['midp'] ['cols='<col. set descriptor>]

--hardy writes autosomal Hardy-Weinberg equilibrium exact test statistics to plink2.hardy, and/or chrX test statistics to plink2.hardy.x. The latter report is based on the computation described in Graffelman J, Weir BS (2016) Testing for Hardy-Weinberg equilibrium at biallelic genetic markers on the X chromosome.

  • By default, only founders are considered; this can be changed with --nonfounders.
  • For variants with k alleles where k>2, k separate 'biallelic' tests are performed, each reported on its own line. However, biallelic variants are normally reported on a single line, since the counts/frequencies would be mirror-images and the p-values would be the same. You can add the 'redundant' modifier to force biallelic variant results to be reported on two lines for parsing convenience.
  • With the 'midp' modifier, a mid-p adjustment is applied (see --hwe for discussion).
  • Since multiple case/control phenotypes can now be loaded simultaneously, this no longer automatically computes separate statistics for just controls or just cases. Call this with e.g. --keep-if to report phenotype-stratified stats.
  • Refer to the file format entries for output details and optional columns.

Linkage disequilibrium >>