Introduction, downloads

D: 2 Aug 2017

Recent version history

What's new?

Coming next

General usage

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Phenotypes

Covariates

(TBD)

Output file list

Credits

File formats

PLINK 2.00 alpha

PLINK 2.0 alpha was developed by Christopher Chang and the Human Longevity, Inc. Data Science team, with substantial input from Stanford's Department of Biomedical Data Science. (More detailed credits.)

Binary downloads

Build
Operating systemDevelopment (2 Aug)
Linux 64-bit Intel1download
Linux 32-bitdownload
OS X (64-bit)download
Windows 64-bitdownload
Windows 32-bitdownload

1: This build can still run on AMD processors, but it's statically linked to Intel MKL, so some linear algebra operations will be slow. We will try to provide an AMD Zen-optimized build as soon as supporting libraries are available.

Source code and build instructions are available on Github. (Here's another copy of the source code.)

Recent version history

2 Aug 2017: --sort-vars implemented.

25 Jul: --loop-cats now works properly with genotype-based variant filters.

24 Jul: Fixed '--pca approx' allele frequency handling bug introduced in 4 Jun build; we recommend redoing any '--pca approx' runs performed with an affected build. (Regular --pca was not affected.) --loop-cats implemented (similar to PLINK 1.x --loop-assoc, except it's not restricted to association tests). VCF export now supports 'vcf-dosage=DS-force' mode. --dummy multithread + dosage bugfix.

17 Jul: BGEN v1.2/1.3 importer memory allocation bugfix. Size of failed allocation is now logged on most out-of-memory errors.

2 Jul: Improved multithreading in BGEN v1.2/1.3 importer. Python writer can now be called with multiple variants at a time.

25 Jun: Basic BGEN v1.2/1.3 import (unphased biallelic dosages; suffices for main UK Biobank data release). --warning-errcode flag added (causes an error code to be returned to the OS on exit when at least one warning is printed).

20 Jun: --condition-list + variant filter bugfix.

5 Jun: --make-pgen memory requirement greatly reduced. End time now printed to console in most situations.

4 Jun: --hwe no longer causes a segfault when chrX is present and no gender information is available. Fixed --dummy bug.

29 May: --import-dosage format=1 bugfix.

26 May: --glm 'standard-beta' modifier replaced with --variance-standardize flag. --quantile-normalize function added. Fixed a missing-sex allele counting bug.

25 May: --hardy/--hwe works properly again when chrX is present but not at the beginning of the dataset.

22 May: Fixed major dosage data + sample-filter bug; we recommend rerunning any operations involving both dosage data and sample filtering performed with earlier plink2 builds. --score 'list-variants' modifier added.

19 May: Fixed a bug with allele frequency computation on dosage data when sample filter(s) are applied.

18 May: Many categorical phenotype-handling flags (--within, --keep-cats, --split-cat-pheno, ...) implemented. Basic phenotype-based filtering implemented (e.g. "--remove-if PHENO1 '>' 2.5"; note that unnamed phenotypes are assigned the names 'PHENO1', 'PHENO2', etc., and that the '<' and '>' characters must be quoted in most shells). --write-covar implemented. --mach-r2-filter implemented, and raw MaCH r2 values can be dumped with "--freq cols=+machr2".

11 May: --condition{-list} + --covar bugfix.

8 May: Fix quantitative phenotype/covariate loading bug introduced in 6 May build.

7 May: --import-dosage implemented.

6 May: Fixed bug which caused '0' to be treated as control instead of missing for binary phenotypes. Minor change to --glm's column headers, in preparation for multiallelic data.

2 May: --score bugfix. --maj-ref bugfix. --vcf-min-dp and '--export A-transpose' implemented.

1 May: VCF dosage import/export, --vcf-min-gq, and --read-freq implemented. --score can now work with standard errors. --autosome{-par} now works properly. SNPHWE2 and SNPHWEX functions relicensed as GPL-2+, to enable inclusion in the HardyWeinberg R package.

20 April: .sample export bugfix (didn't work if file was over 256 KB and no phenotypes were present). --dummy implemented (can now generate dosages).

19 April: --hardy/--hwe chrX bugfix (thanks to Jan Graffelman for catching the problem and validating the fix). --new-id-max-allele-len now has three modes ('error', 'missing', and 'truncate'), and the default mode is now 'error' (i.e. --set-missing-var-ids and --set-all-var-ids now error out when an allele code longer than 23 characters is encountered, instead of silently truncating). --score implemented, and extended to support variance-normalization and multiple score columns (these two features provide a simple way to project new samples onto previously computed principal components).

11 April: --pca var-wts bugfix, and --pca eigenvalue ordering bugfix. --glm linear regression and --condition{-list} support added. --geno/--mind/--missing/--genotyping-rate can now refer to missing dosages instead of just missing hardcalls (note that, when importing dosage data, dosages in (0.1, 0.9) and (1.1, 1.9) are saved but there usually won't be associated hardcalls).

20 March 2017: Initial public release.


What's new?

  • Preservation of reference alleles (without requiring constant use of --keep-allele-order), phase information, and the VCF QUAL, FILTER, and INFO fields. Use --make-pgen instead of --make-bed when importing a VCF; the fileset can then be referenced with --pfile. We will provide 1000 Genomes phase 3 downloads in the new fileset format as soon as multiallelic variants are also supported.
  • The new .pgen file format incorporates SNPack-style genotype compression, frequently reducing file sizes by 80+% with negligible computational cost. To allow users to take advantage of genotype compression without sacrificing compatibility with scripts expecting old-style .bim and .fam text files, PLINK 2.0 supports a hybrid .pgen + .bim + .fam usage mode (--make-bpgen/--bpfile). We've also provided a Python library for reading and writing .pgen files, to simplify migration to the new format. (PLINK 1 .bed files are valid .pgen files, so code written on top of the library is backward-compatible.)
  • Firth regression ('--glm firth-fallback', '--glm firth'). Standard logistic regression fails to converge, yielding 'NA' or nonsense results, when the 2x2 allele/phenotype contingency table has an empty cell ("quasi-complete separation"); this is common, and especially likely to happen with the strongest associations. Firth regression can prevent you from missing these associations. The fast 'firth-fallback' mode (only use Firth regression when there's either an empty contingency table cell or regular-logistic-regression convergence failure) gets you most of the benefit for a fraction of the computational cost.
  • '--pca approx' (equivalent to EIGENSOFT 6 fastmode with default parameters). If you have more than ten thousand samples, only need the top principal components, and can tolerate ~0.1% error in the last PC, this can save you a ton of compute time.
  • The 64-bit Linux build can handle linear algebra on matrices with more than 231 elements (so regular --pca is no longer limited to ~46000 samples), as long as your system has enough memory.
  • KING-robust kinship coefficients (--make-king, --make-king-table, --king-cutoff). These remain accurate when good population allele frequency estimates are unavailable. We have found --king-cutoff to be much more reliable than the PLINK 1.9 --rel-cutoff flag for removal of close relations.
  • Proper support for dosages (decimal allele count expected values). When .gen/.bgen files are imported, hardcalls and dosages are saved to the .pgen. Operations which naturally extend to decimals (e.g. --pca, --glm, --freq, --maf/--mac) use the dosage information when it's present, while methods that can only make use of hardcalls (e.g. KING-robust, Hardy-Weinberg exact test) simply ignore the dosages. --hard-call-threshold can now be used to change the saved hardcalls without changing the dosages.
  • Much more multithreaded code.
  • Most commands let you control which columns appear in the main output file(s).
  • Broad support for both gzipped and Zstd-compressed text input files.
  • Graffelman and Weir's extended chrX Hardy-Weinberg exact test, which takes male allele frequencies into account. We've found that this tends to identify quite a few obviously miscalled chrX variants which were not caught by the usual QC filters.
  • Oxford-style haplotype filesets can now be imported and exported (--haps, '--export haps'/'--export hapslegend').
  • Sample-major PLINK binary files can now be efficiently exported ('--export ind-major-bed'). This is close to 3 orders of magnitude faster than the previous implementation (PLINK 1.07 --make-bed + --ind-major).

Coming next

  1. BGEN v1.2 and v1.3 export, and phased data support.
  2. Multiallelic variant support.
  3. BCF2 import/export.
  4. Merge. (Once this is operational, a stable version of the .pgen specification will be provided, and PLINK 2.0 beta testing will begin.)

General usage >>