Introduction, downloads

D: 13 Dec 2019

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf[.gz])

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--pgen-info

Linkage disequilibrium

--indep...

--ld

Sample comparison

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--adjust-file

Linear scoring

--score

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

Errors and warnings

Output file list

Order of operations

Google groups

Credits

File formats

Quick index search

Credits

PLINK 2.0 alpha was developed by Christopher Chang with support from GRAIL, Inc. and Human Longevity, Inc., and substantial input from Stanford's Department of Biomedical Data Science.

  • The .pgen file format is primarily an adaptation of ideas from SNPack (Francisco Sambo, Barbara Di Camillo, Gianna Toffolo, Claudio Cobelli) to more general contexts.
  • The KING-robust kinship estimation method was developed by Ani Manichaikul et al. at the University of Virginia.
  • The extended chrX Hardy-Weinberg equilibrium test was developed by Jan Graffelman (Universitat Politecnica de Catalunya) and Bruce Weir (University of Washington).
  • Christoph Lippert (Human Longevity), Manuel Rivas (Stanford), and Yosuke Tanigawa (Stanford) contributed significantly to development and testing of the Python and R file I/O libraries.
  • Firth logistic regression was added due to discussion with Manuel Rivas. The implementation is ported from Georg Heinze's logistf R package.
  • The Zstandard compression format and libraries used by PLINK 2.0 were developed by Yann Collet and Przemysław Skibiński, with support from Facebook, Inc.
  • The fast PCA approximation is ported from Kevin Galinsky's contribution to EIGENSOFT 6.
  • The array-popcount improvement is based on Faster Population Counts Using AVX2 Instructions by Wojciech Muła, Nathan Kurz, and Daniel Lemire, along with Kim Walisch's libpopcnt implementation.
  • The bgzip-decompression improvement is based on libdeflate (Eric Biggers), and htslib 1.8 integration work by James Bonfield.
  • Thanks to numerous other testers at GRAIL, Human Longevity, Stanford, and elsewhere for bug reports and helpful suggestions.

File format reference >>