Introduction, downloads

D: 15 Oct 2019

Recent version history

What's new?

Coming next

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{,.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies



'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file



SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition


Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts


Imputation quality


Founder status

Main functions

Data management

















Basic statistics





Linkage disequilibrium


Distance matrices







1000 Genomes phase 3

Output file list

Order of operations


File formats

PLINK 2.00 alpha

PLINK 2.0 alpha was developed by Christopher Chang, with support from GRAIL, Inc. and Human Longevity, Inc., and substantial input from Stanford's Department of Biomedical Data Science. (More detailed credits.)

Binary downloads

Operating systemDevelopment (15 Oct)
Linux AVX2 Intel1download
Linux 64-bit Intel1download
Linux 32-bitdownload
OS X AVX2download
OS X 64-bitdownload
Windows AVX2download
Windows 64-bitdownload
Windows 32-bitdownload

1: These builds can still run on AMD processors, but they're statically linked to Intel MKL, so some linear algebra operations will be slow. We will try to provide an AMD Zen-optimized build as soon as supporting libraries are available.

Source code and build instructions are available on GitHub. (Here's another copy of the source code.)

Recent version history

15 Oct 2019: Fixed bug in 12 Oct Linux builds that caused plink2 to hang on --extract/--exclude/--snps and similar variant ID filters. Implemented --extract-fcol, which filters variants based on a TSV column (this is an extension of PLINK 1.x --qual-scores).

12 Oct: "--hwe 0" no longer removes a small number of very-low-HWE-p-value variants.

9 Oct: --pheno/--covar 'iid-only' modifier added, supporting headerless files with a single ID column. Windows BGZF compression is now multithreaded. Improved read-error messages.

6 Oct: Windows --silent bugfix. Source code now supports dynamic linking with libzstd (though performance may suffer if you don't build the multithreaded version of that library).

4 Oct: --king-table-subset + --parallel bugfix. Automatic Zstd text-file decompression was broken for a few commands by the 28 Sep build; that should work properly now.

3 Oct: Fixed BGZF decompression bugs in 28 Sep build. (This did not affect VCF -> .bed/.pgen conversion, though some rarer use cases were affected.) SID-loading bugfix.

28 Sep: Mixed-provisional-reference bugfixes. --ref-allele/--alt1-allele/--update-map/--update-name skip-count bugfix. --glm local-covar line-skipping bugfix. Automatic-rename when an input filename matches an output filename should work properly again instead of erroring out (though it should still be avoided).

10 Sep: --glm joint test p-value bug fix. (This bug only affected runs where --tests was invoked with 4 or more predictors.)

26 Aug: --read-freq now prints a warning, instead of segfaulting or entering an infinite loop, when all variants have already been filtered out.

21 Aug: Fixed --ref-from-fa/--ref-allele + VCF export interaction that caused spurious 'PR' INFO flags to be reported.

10 Aug: Open-fail and write-fail error messages now include a more detailed explanation of what went wrong. --bgen, --data, and --gen now have a 'ref-unknown' modifier for explicitly specifying that neither the first nor last allele is consistently REF.

31 Jul: --score prints an error message instead of segfaulting when an input-file line is truncated. Fixed rare --glm bug that could cause all results to be reported as 'NA' when exactly one covariate is defined. .log files print '--out' and '--d' properly again (this was broken by the 24 Jul build). --glm now has an optional output column ('err') which reports the reason for each 'NA' coefficient.

24 Jul: --d implemented.

8 Jul: --rm-dup/--sample-diff/--ld multiallelic variant bugfix.

5 Jul: --read-freq moved before usual allele frequency/count computation in order of operations. Loaded allele frequencies are not recomputed any more.

28 Jun: --king-table-subset should work properly again.

26 Jun: Fixed --glm multiallelic-variant bug that could cause one allele to be reported twice and one covariate test to be unreported, when neither 'hide-covar' nor 'intercept' was specified. Fixed issue that could cause --glm genotypic/hethom to segfault with no covariates.

17 Jun: Fixed rare underflow in --glm p-value computation which could cause an assertion failure.

27 May: Unbroke --adjust-file. "--export ind-major-bed" performance improvement.

12 May: Fixed --glm linear regression phenotype-batch handling bug that could cause a crash (or, on .bed-formatted data, generate incorrect results) on batches of size > 240.

29 Apr: BGEN 1.2/1.3 phased-dosage import bugfixes. --make-pgen + --dosage-erase-threshold without --hard-call-threshold no longer crashes.

28 Apr: PLINK 2-specific extensions to --update-ids and --update-parents simplified. --id-delim/--sample-diff 'sid' modifier for specifying that single-delimiter sample IDs should be interpreted as IID-SID changed to --iid-sid flag.

27 Apr: --haps bugfix for sample counts congruent to 17..31 (mod 32). This only affected the last few samples of the file, but if you used --haps with an earlier build, we strongly recommend rerunning it. --glm logistic regression 'SE' column renamed to LOG(OR)_SE when reporting odds ratio, to make it more obvious that the reported standard error does not use odds ratio units. --update-parents implemented.

2 Apr: Fixed --hwe bug that could cause chrY and MT variants to be improperly filtered. --glm 'pheno-ids' now works for groups of quantitative phenotypes.

1 Apr: --glm without --adjust now detects groups of quantitative phenotypes with the same "missingness pattern", and processes them together (with a large speed increase; but be careful re: disk space, you probably want to use the 'hide-covar' modifier, 'zs' and/or --pfilter might also be useful). --glm linear regression local-covar= bugfix.

26 Mar: Minimac3-r2 computation bugfix. --glm no longer generates .id files listing all samples used for each phenotype, unless the 'pheno-ids' modifier is added. --update-ids implemented.

23 Mar: Fixed multiallelic-variant writer bug that could affect files where the largest number of alleles is 6 or 18. --minimac3-r2-filter and --freq minimac3r2 column implemented.

18 Mar: --write-covar can now be used when no covariates are loaded, if at least one phenotype is loaded and phenotype output was requested.

9 Mar: plink2 --version and --help no longer return nonzero exit codes.
A draft PGEN specification is now available.

6 Mar: Fixed allele frequency computation bug that could cause a spurious "Malformed .pgen file" error when a variant filter was active.

5 Mar: Multithreaded --extract/--exclude.

4 Mar: --tests linear-regression output bugfix.

3 Mar: Fix --glm odds-ratio printing bug introduced on 1 Mar.

2 Mar: More help text cleanup (now including online documentation).

1 Mar: --recode-allele implemented (and renamed to --export-allele for consistency). VCF import now errors out when a space-containing INFO value is imported. Brackets in command-line help text are now used in a manner more similar to other tools.

21 Feb: --glm joint tests are now based on F-statistics, for better small-sample accuracy.

20 Feb: --import-dosage-certainty now always produces a missing call, instead of falling back on the VCF GT field, when dosage certainty is inadequate. --extract-intersect flag added.

19 Feb: --glm works properly again with no covariates (it was exiting with a spurious "out of memory" error). --import-dosage-certainty now has the expected effect on single-valued dosages, instead of just genotype-probability triplets.

18 Feb: Fixed a bug that could cause --missing to crash on dosage data.

14 Feb: Command-line integer parameters can now use scientific notation.

12 Feb: Phased-dosage import bugfix.

2 Feb: --tests + --parameters bugfix.

31 Jan: --pca approx now errors out instead of reporting inaccurate results when the number of variants is too small relative to the number of PCs. --pca approx eigenvalue bugfix.

30 Jan: --glm covariate-scale error is now propagated properly, instead of producing a mysterious out-of-memory error message.

27 Jan: --tests implemented.

22 Jan: --glm now errors out and recommends adding --covar-variance-standardize when covariates vary enough in scale for numeric instability to be a major concern.

2 Jan 2019: Phased-dosage import bugfix.

27 Dec 2018: --ref-allele/--alt1-allele skipchar was broken for the past few months; it should work properly again. Fixed a bug which occurred when importing an all-noninteger-dosage variant.

28 Oct: --keep-fam/--remove-fam bugfix.

2 Oct: Fixed bug that could occur when loading very long text lines (e.g. VCF lines longer than 5 MB).

22 Sep: Fixed rare bug that could occur when processing variants out of order. --sample-diff command implemented.

12 Sep: --normalize 'list' modifier added.

11 Sep: --rm-dup 'list' modifier added, for listing all duplicated variant IDs. (This can be run as a standalone command.)

9 Sep: Fixed rare race condition in text decompressor that could cause input lines to be skipped. (We believe this was the cause of the VCF-import "File read failure" crashes reported over the last few months.)

8 Sep: Fixed VCF-export bug that could occur when extra ##contig header lines were present. --sort-vars bugfix. --normalize now detects when post-normalization variants are no longer in sorted order, and prints a warning in that case.

7 Sep: --ld bugfix for phased multiallelic variants. --rm-dup flag added (removes duplicate-ID variants, can check for genotype/INFO/etc. equality).

4 Sep: Fixed A1_CASE_FREQ and related columns in --glm output broken by recent multiallelic update. Cleaned up a few column names in --geno-counts and --hardy output.

31 Aug: Fixed --glm bug with handling constant and all-constant-but-1 covariates.

30 Aug: AVX2 and 32-bit --export bgen-1.2/1.3 bugfixes (mainly affects missing genotypes). "--export vcf-4.2" mode added for compatibility with programs (e.g. SNPTEST) which reject VCF 4.3 files. Exported VCFs should now have more appropriate ##contig headers when PAR1 and/or PAR2 are present in the input. Left-normalization (--normalize) flag added.

26 Aug: Last column of --pca .eigenvec header line is no longer omitted.

21 Aug: Fixed --mac/--max-mac 'nref' and 'alt1' mode bugs in yesterday's build.

20 Aug: Fixed "--vcf dosage=GP" bug introduced on 7 May; if you used any build from the last three-and-a-half months to import VCF FORMAT:GP data, rerun with a newer build. "--vcf dosage=GP" now errors out with a suitable message when the file also contains a FORMAT:DS field, and a 'dosage=GP-force' option has been added to cover the rare cases where importing the GP field might still be worthwhile. --maf/--max-maf/--mac/--max-mac now let you filter on nonmajor (default), non-reference, alt1, or minor allele frequencies/counts; you can use bcftools notation for this (e.g. "--min-af 0.01:minor"), but keep the different default in mind.

18 Aug: plink2-formatted 1000 Genomes phase 3 files, with phased haplotypes and annotations included, and a few corrections to the official pedigree (determined via KING-robust analysis), can now be downloaded from the Resources page. --king-cutoff can now handle sample ID files containing a header line.

16 Aug: --glm logistic regression now supports multiallelic variants. Fixed --glm linear-regression dosage handling bug in yesterday's build.

15 Aug: --glm linear regression now supports multiallelic variants. --ld bugfix. --parameters + "--glm interaction" now works properly when a covariate is only involved as part of an interaction.

9 Aug: --make-king{,-table} singleton/monomorphic-variant optimization implemented.

7 Aug: GRM construction and --missing no longer break with multiallelic data.

6 Aug: VCF multiallelic(-phased) import and export implemented. --hwe now tests each allele separately for multiallelic variants. --min-alleles/--max-alleles filtering flags added.
(--glm doesn't support multiallelic variants yet; that update is planned for next week.)

30 Jul: --vcf-max-dp flag added.

26 Jul: --vcf-half-call should now work properly on unphased data.

25 Jul: Fixed --sort-vars/low-memory-make-pgen dosage-handling bug that could trigger unwanted hardcall thresholding. If you used a build from 14 Apr - 19 Jul 2018 to work with dosage data, the hardcalls may not have been thresholded correctly. Unfiltered dosage datasets imported by an affected build can be corrected by running --make-pgen + explicit --hard-call-threshold. Hardcall-based filters such as --geno/--mind should be rerun (after the hardcalls have been corrected).

19 Jul: --update-alleles implemented.

16 Jul: Added more multithreaded-VCF-parse debug logging code.

13 Jul: Fixed chrX/Y/MT autoremoval bug in --make-king/--make-grm/--pca.

12 Jul: Unbroke --mach-r2-filter.

3 Jul: .fam/.psam files now load properly when only the IID column is requested or present.

29 Jun: .bim/.pvar files with more than ~134 million variants load properly again (given sufficient memory).

25 Jun: "--pca approx" eigenvalues should now be (approximately) correct (they were previously double what they should have been). Fixed a few odd-sample-count export cases which were broken around 30 May.

22 Jun: Fixed a few log messages which were broken in the 19-20 Jun builds. Added debug-print code to support an ongoing multithread-VCF-dosage-import bug investigation (if you are encountering mysterious "File read failure" errors during VCF import or "Malformed .pgen" errors when reading the result, adding "--threads 1" to your VCF-import command will probably solve your immediate problem, but if you can also send me a .log file from the failing multithreaded run (or even better, test data) that would be very helpful).

20 Jun: Fix GRM/PCA/score-computation bug introduced on 30 May. If you used the 30 May or an early June build for GRM/--pca/--score, you should repeat the operation(s) with this build; apologies for the error.

19 Jun: Fixed rare --ref-allele/--alt1-allele corner case which could occur when a missing allele was replaced with a very long allele.

5 Jun: VCF import uninitialized-variable bugfix. --score 'ignore-dup-ids' modifier added.

30 May: "--export haps{,legend}" bugfixes and bgzip support. "--export vcf vcf-dosage=DS" no longer exports undeclared HDS values when phase information is present. Unbreak --import-dosage + --map, for real this time.

21 May: --pgen-info command added (displays basic information about a .pgen file, such as whether it has any phase or dosage data).

20 May: Unbreak --import-dosage + --map.

17 May: --import-dosage and .gen import were broken for the last several weeks; this should be fixed now. A1 column added to --adjust output in preparation for multiallelic variants. --glm 'a0-ref' modifier renamed to 'omit-ref'.

15 May: Fixed chrX allele frequency computation bug when dosages are present. --ld modified to be based on major instead of reference alleles, to play better with multiallelic variants. --hardy header line and allele columns changed in preparation for multiallelic variant support.

8 May: --vcf dosage=HDS should now handle files with no DS field properly.

7 May: Fixed rare I/O deadlock. Improved VCF-import parallelism.

4 May: Fixed --bgen import/export when dosage precision bits isn't a multiple of 8 (previously misinterpreted the spec for those cases, sorry about that).

3 May: --bgen can now import variant records with up to 28 bits of dosage precision (though only ~15 bits will survive). "--export vcf-dosage=HDS-force" bugfix.

2 May: --vcf dosage= import no longer requires GT field to be present. Fixed potential --vcf dosage=HDS buffer overflow.

28 Apr: Fixed a --glm bug which occurred when autosomes and sex chromosome(s) were both present, or both chrX and chrY were present. If you performed a whole-genome --glm run with the 9 Feb 2018 build or later, you should rerun with the latest build. However, single-chromosome and autosome-only --glm runs were unaffected by the bug.

24 Apr: VCF phased-dosage import ("--vcf dosage=HDS") and export ("--export vcf vcf-dosage=HDS"). --pca and GRM computation now use correct variance for all-haploid genomes.

22 Apr: --export bgen-1.2/bgen-1.3 should now work for chrX/chrY/chrM; also fixed import bugs for those chromosomes.

16 Apr: --ref-from-fa contig line parsing bugfix.

14 Apr: --export bgen-1.2/bgen-1.3 implemented for autosomal diploid data. Operations like --pca which require decent allele frequencies now error out when frequencies are being estimated from less than 50 samples, unless you add the --bad-freqs flag. Phased dosage support implemented. Sample missingness rate in exported .sample files is now based on dosages rather than hardcalls. Non-AVX2 phase subsetting bugfix. --vcf + --psam bugfix. --vcf dosage= now ignores the hardcall when a dosage is present; instead, it's regenerated under --hard-call-threshold 0.1 (unless you specified a different threshold). --bgen 'ref-second' modifier renamed to 'ref-last', to generalize properly to multiallelic variants.

31 Mar: --export haps{,legend} should now work properly when --ref-allele/--ref-from-fa/etc. flips some alleles in the same run.

29 Mar: --set-{missing,all}-var-ids non-AVX2 bugfix. --pheno/--covar autonaming bugfix.

28 Mar: --bgen 1-bit phased haplotype import implemented.

26 Mar: --make-bed + --indiv-sort bugfix.

23 Mar: Windows builds should work properly again (the 20-21 Mar Windows builds were badly broken). --glm now supports log-pvalue output (add the 'log10' modifier), and these remain accurate below the double-precision floating point limit of p=5e-324.

21 Mar: 3-column .sample file loading works properly again. Fixed a file-reading race condition.

20 Mar: Fix possible deadlock in recent builds when loading very long lines.

19 Mar: Fix --sample segfault in recent builds. .bgen import/export speed improvement. --oxford-single-chr wasn't extended correctly in the 4 Mar build; this should be fixed now.

11 Mar: Fix --pheno segfault in last week's builds that could occur when the file didn't have a header line.

9 Mar: Fix "File write failure" bug that occurred when a single write operation was larger than 2 GB (this could occur when running --make-bed with more than 128k samples). Reduced --make-bed memory requirement.

7 Mar: Fixed potential file-reading deadlock in recent builds (23 Feb or later).

5 Mar: --glm local-covar= should work properly again.

4 Mar: --oxford-single-chr can now be used on .bgen files. --make-pgen partially-phased data handling bugfix.

26 Feb: --keep/--remove/etc. should work properly now on IID-only files with no header line.

23 Feb: Fixed alpha 2 --vcf + --id-delim bug. Improved parsing speed for compressed VCF and .pvar files.

20 Feb: "--xchr-model 1" should work properly now.

16 Feb 2018 (alpha 2): This makes the following potentially compatibility-breaking changes:

  • FID is now an optional field: if it isn't in the input .psam file, it's omitted from several output files by default (these now have 'maybefid' and 'fid' column sets, where the default set includes 'maybefid'), and treated as always-'0' by any operation which requires FID values (such as --make-bed). When exporting genomic data files, 'maybefid' also treats the column as missing if all remaining values are '0'.
  • Relatedly, when importing sample IDs from a VCF or .bgen file, the default mode is now "--const-fid 0", and no FID column will be written to disk at all. --keep, --remove, and similar commands also now have "--const-fid 0" semantics when an input line contains only one token. You can now act as if IID is the only sample ID component, if that's what makes the most sense for your workflow. Conversely, it is now necessary to explicitly use --id-delim when you want to split the VCF/.bgen sample IDs into multiple components.
  • MT is treated as a haploid chromosome again. In PLINK 1.9 and earlier plink2 builds, MT was treated as diploid-ish to avoid throwing away information about heteroplasmic mutations; as a consequence, the --glm(/--linear/--logistic) genotype column and commands like "--freq counts" used a 0..2 scale. Now that plink2 has proper support for dosages, this kludge is no longer necessary.
  • --glm's 't' column set has been renamed to 'tz', to reflect it being a T-statistic for linear regression but a Wald Z-score for logistic/Firth. The corresponding column in .glm.logistic{,.hybrid} and .glm.firth files now has 'Z_STAT' in the header line.

Also, --glm now defaults to regressing on minor instead of ALT allele dosages (this can be overridden with 'a0-ref').

The final alpha 1 build has been tagged in GitHub, and will remain downloadable from here for the next few months.

11 Feb: files now end in .id, for consistency with other output files with sample IDs and no other information. Similarly, --mind's output file now has the extension and defaults to having a header line. You can now use --no-id-header to suppress the header line (and force the columns to be FID/IID) in all .id output files.

10 Feb: --update-sex 'male0' option added, and custom column selection interface changed (now 'col-num='). --glm 'gcountcc' column names updated (now 'CASE_NON_A1_CT', 'CASE_HET_A1_CT', etc.) in preparation for switch to A1=major allele. --make-just-pvar + --ref-allele/--ref-from-fa no longer treats all initial reference alleles as provisional when the input .pvar has a header line.

9 Feb: Forcing .pvar QUAL/FILTER output when no such values are loaded no longer causes a segfault.

5 Feb: AVX2 phase-subsetting bugfix.

3 Feb: --score 'dominant' and 'recessive' modifiers added.

30 Jan: Fix .pgen writing bug which occurred when the number of variants was a multiple of 64 and the number of samples was large.

24 Jan: "--export oxford" now supports bgzipped output.

21 Jan: --glm now always reports an additional 'A1' column, indicating which allele(s) correspond to positive genotype column values. --glm column sets have been changed to revolve around A1 instead of ALT, so minor script modifications may be necessary when switching to this build.
In this build, A1 and ALT are still synonymous. This will change in alpha 2: A1 will default to the minor allele(s) to reduce multicollinearity (imitating PLINK 1.x's behavior in the absence of --keep-allele-order), though you will still have the option of forcing A1=ALT.

12 Jan: Fixed "--glm interaction" bug that occurred when multiple consecutive variants had no missing calls. We recommend redoing all --glm runs with the 'interaction' modifier which were performed with a build produced between 27 Nov 2017 and 10 Jan 2018 inclusive.

10 Jan: --adjust-file implemented (performs --adjust's multiple-testing correction on any association analysis file).

9 Jan: Added 'no-idheader' modifiers to a few commands, and made that the default for --make-grm-bin/--make-grm-list to avoid breaking interoperability.

7 Jan: --vcf can now be given a sites-only VCF when the run doesn't require genotype data. Sample ID files, such as those produced by --write-samples, now include a header line by default; this will be necessary to distinguish between FID-IID and IID-SID output in the future. (With --write-samples, you can suppress the header line by adding the 'noheader' modifier.)

5 Jan: --pheno-col-nums/--covar-col-nums implemented.

2 Jan 2018: --keep-fcol (equivalent to PLINK 1.x --filter) implemented.

19 Dec 2017: --adjust implemented. --zst-level implemented (lets you control Zstd compression level). Un-broke --rerun.

18 Dec: --extract/--exclude can now be used directly on UCSC interval-BED files (ok for coordinates to be 0-based or for no 4th column to be present). "--output-chr 26" now causes PAR1/PAR2 to be rendered as '25' (for humans), to restore interoperability with programs like ADMIXTURE which can't handle alphabetic chromosome codes. --merge-x implemented (usually needs to be combined with --sort-vars now). --pvar can usually handle 'sites-only' VCF files (e.g. those released by the gnomAD project) now. --thin, --thin-count, --thin-indiv, and --thin-indiv-count implemented.

16 Dec: Multithreaded zstd compression implemented (on Linux and OS X). --make-grm-gz renamed to --make-grm-list, and gzip mode removed.

15 Dec: Fixed --extract-if-info and --exclude-if-info's behavior for non-numeric values which start with a number. Existence-checking flags renamed to --require-info and --require-no-info for naming consistency.

13 Dec: --extract-if-info and --exclude-if-info flags added, for simple filtering on INFO key/value pairs or key existence.

11 Dec: --king-table-subset flag added. This makes it straightforward to perform two-stage relationship/duplicate detection: start with --make-king-table on a small number of higher-MAF variants scattered across the genome, and then rerun it with --king-table-subset on an appropriate subset of candidate sample pairs from the first stage. --bp-space implemented (useful for the first stage above).
The two-stage workflow was first implemented by Wei-Min Chen in a recent version of KING; contact him for citation information.

7 Dec: Fixed bug which could occur when filtering samples from a phased dataset. Windows AVX2 build now available.

28 Nov: --import-dosage 'format=infer' (this is now the default) and 'id-delim=' (needed for reimport of "--export A-transpose" data) options added. Fixed --import-dosage bug that caused it to error out on missing genotypes under format=1. --no-psam-pheno (or --no-pheno/--no-fam-pheno) can now be used to ignore all phenotypes in the sample file, while keeping the phenotype(s) in the --pheno file if one was specified.

27 Nov: Implemented fast path for --glm no-missing-genotype case (mainly affects linear regression). --make-king{,-table} can now automatically handle matrices too large to fit in memory without explicit use of --parallel. AVX2 sample filtering performance improvement. --validate bugfix.

19 Nov: Fix VCF FORMAT:GT header line parsing bug introduced in 14 Nov build.

18 Nov: --make-king{,-table} performance improvements.

16 Nov: Fixed bug in 14 Nov build that broke ##chrSet header line parsing.

14 Nov: Fixed bug that caused --export {A,AD} to hang when the number of variants was between 65 and about a thousand.

4 Nov: Linux and OS X prebuilt AVX2 binaries now available; these should work well on most machines built within the last 4 years. Fixed another Firth regression spurious NA bug. Fixed --score bug that occurred when sample filter(s) were applied simultaneously. Fixed a --ld phased-hardcall handling bug. Array-popcount upgrade in progress (thanks to recent work by Wojciech Muła, Nathan Kurz, Daniel Lemire, and Kim Walisch).

3 Nov: Fixed multipass --export {A,AD} bug. --dummy dosage-freq= now fills in hardcalls with the default --hard-call-threshold cutoff of 0.1 when --hard-call-threshold is not explicitly specified.

2 Nov: --export {A,AD} implemented (with dosage support). --dummy dosage-freq= modifier now works properly for dosage frequencies above 0.75.

16 Oct: --ref-from-fa flag implemented, to set reference alleles from a FASTA file. (Note that this may be unable to determine which allele is reference when length changes are involved, but it should always work for SNPs and multi-nucleotide polymorphisms.) --update-name implemented. Fixed column-set parsing bug in 13 Oct build.

13 Oct: Fixed --glm logistic/Firth regression bug which could produce spurious NA results.

9 Oct: Fixed --ld's handling of some dosage and haploid cases. Fixed bug which could cause --make-pgen to discard phase/dosage information when extracting a small variant subset. --geno-counts no longer double-reports chrY counts.

8 Oct: --ld implemented, with supported for phased genotypes and dosages (try "--ld <var1> <var2> dosage"). Fixed tiny bgen-1.1 import bug that triggered when the number of threads exceeded the number of variants. Allele frequency computation no longer crashes on chrX when dosages are present but only hardcalls are needed.

1 Oct: Fixed GRM computation bug which sometimes caused segfaults when both dosages and missing values were present. --glm is now a bit faster when many covariates are present.

20 Sep: Firth regression Hessian matrix inversion step raised to double-precision, after last week's builds revealed that single-precision inversion could be unreliable.

15 Sep: --vif/--max-corr per-variant checks are now working. These are no longer skipped during logistic regression.

8 Sep: Alternative VCF INFO:PR fields are now tolerated. Removed debug code that slowed down yesterday's --make-pgen.

7 Sep: --score uninitialized memory bugfix. Partially-phased data handling bugfix.

6 Sep: Fix OS X stack size issue (could cause --pca and some other commands to crash in recent builds; 1 Sep build had an incomplete workaround).

4 Sep: --{,covar-}variance-standardize missing value handling bugfix. --ref-allele/--alt1-allele implemented (--a2-allele and --a1-allele are treated as aliases).

1 Sep: --{pheno,covar}-quantile-normalize missing-phenotype handling bugfix.

29 Aug: --glm 'gcountcc' column set option added (reports genotype hardcall counts, stratified by case/control status). --write-samples command added (analogous to --write-snplist).

2 Aug: --sort-vars implemented.

25 Jul: --loop-cats now works properly with genotype-based variant filters.

24 Jul: Fixed "--pca approx" allele frequency handling bug introduced in 4 Jun build; we recommend redoing any "--pca approx" runs performed with an affected build. (Regular --pca was not affected.) --loop-cats implemented (similar to PLINK 1.x --loop-assoc, except it's not restricted to association tests). VCF export now supports 'vcf-dosage=DS-force' mode. --dummy multithread + dosage bugfix.

17 Jul: BGEN v1.2/1.3 importer memory allocation bugfix. Size of failed allocation is now logged on most out-of-memory errors.

2 Jul: Improved multithreading in BGEN v1.2/1.3 importer. Python writer can now be called with multiple variants at a time.

25 Jun: Basic BGEN v1.2/1.3 import (unphased biallelic dosages; suffices for main UK Biobank data release). --warning-errcode flag added (causes an error code to be returned to the OS on exit when at least one warning is printed).

20 Jun: --condition-list + variant filter bugfix.

5 Jun: --make-pgen memory requirement greatly reduced. End time now printed to console in most situations.

4 Jun: --hwe no longer causes a segfault when chrX is present and no gender information is available. Fixed --dummy bug.

29 May: --import-dosage format=1 bugfix.

26 May: --glm 'standard-beta' modifier replaced with --variance-standardize flag. --quantile-normalize function added. Fixed a missing-sex allele counting bug.

25 May: --hardy/--hwe works properly again when chrX is present but not at the beginning of the dataset.

22 May: Fixed major dosage data + sample-filter bug; we recommend rerunning any operations involving both dosage data and sample filtering performed with earlier plink2 builds. --score 'list-variants' modifier added.

19 May: Fixed a bug with allele frequency computation on dosage data when sample filter(s) are applied.

18 May: Many categorical phenotype-handling flags (--within, --keep-cats, --split-cat-pheno, ...) implemented. Basic phenotype-based filtering implemented (e.g. "--remove-if PHENO1 '>' 2.5"; note that unnamed phenotypes are assigned the names 'PHENO1', 'PHENO2', etc., and that the '<' and '>' characters must be quoted in most shells). --write-covar implemented. --mach-r2-filter implemented, and raw MaCH r2 values can be dumped with "--freq cols=+machr2".

11 May: --condition{,-list} + --covar bugfix.

8 May: Fix quantitative phenotype/covariate loading bug introduced in 6 May build.

7 May: --import-dosage implemented.

6 May: Fixed bug which caused '0' to be treated as control instead of missing for binary phenotypes. Minor change to --glm's column headers, in preparation for multiallelic data.

2 May: --score bugfix. --maj-ref bugfix. --vcf-min-dp and "--export A-transpose" implemented.

1 May: VCF dosage import/export, --vcf-min-gq, and --read-freq implemented. --score can now work with standard errors. --autosome{,-par} now works properly. SNPHWE2 and SNPHWEX functions relicensed as GPL-2+, to enable inclusion in the HardyWeinberg R package.

20 April: .sample export bugfix (didn't work if file was over 256 KB and no phenotypes were present). --dummy implemented (can now generate dosages).

19 April: --hardy/--hwe chrX bugfix (thanks to Jan Graffelman for catching the problem and validating the fix). --new-id-max-allele-len now has three modes ('error', 'missing', and 'truncate'), and the default mode is now 'error' (i.e. --set-missing-var-ids and --set-all-var-ids now error out when an allele code longer than 23 characters is encountered, instead of silently truncating). --score implemented, and extended to support variance-normalization and multiple score columns (these two features provide a simple way to project new samples onto previously computed principal components).

11 April: --pca var-wts bugfix, and --pca eigenvalue ordering bugfix. --glm linear regression and --condition{,-list} support added. --geno/--mind/--missing/--genotyping-rate can now refer to missing dosages instead of just missing hardcalls (note that, when importing dosage data, dosages in (0.1, 0.9) and (1.1, 1.9) are saved but there usually won't be associated hardcalls).

20 March 2017: Initial public release.

What's new?

  • Preservation of reference alleles (without requiring constant use of --keep-allele-order), phase information, and the VCF QUAL, FILTER, and INFO fields. Use --make-pgen instead of --make-bed when importing a VCF; the fileset can then be referenced with --pfile.
  • The new .pgen file format incorporates SNPack-style genotype compression, frequently reducing file sizes by 80+% with negligible computational cost. Note that this captures some major patterns that are missed by the usual general-purpose compression algorithms: our 1000 Genomes phase 3 downloads are 70+% smaller than the gzipped originals (and remain 45+% smaller after .pgen un-archiving), without throwing away any relevant information.
  • To allow users to take advantage of genotype compression without sacrificing compatibility with scripts expecting old-style .bim and .fam text files, PLINK 2.0 also supports a hybrid .pgen + .bim + .fam usage mode (--make-bpgen/--bpfile). We've also provided a Python library for reading and writing .pgen files, to simplify migration to the new format. (PLINK 1 .bed files are valid .pgen files, so code written on top of the library is backward-compatible.)
  • Firth regression ("--glm firth-fallback", "--glm firth"). Standard logistic regression fails to converge, yielding 'NA' or nonsense results, when the 2x2 allele/phenotype contingency table has an empty cell ("quasi-complete separation"); this is common, and especially likely to happen with the strongest associations. Firth regression can prevent you from missing these associations. The fast 'firth-fallback' mode (only use Firth regression when there's either an empty contingency table cell or regular-logistic-regression convergence failure) gets you most of the benefit for a fraction of the computational cost.
  • "--pca approx" (equivalent to EIGENSOFT 6 fastmode with default parameters). If you have more than ten thousand samples, only need the top principal components, and can tolerate ~1% error in the last PC, this can save you a ton of compute time.
  • The 64-bit Linux build can handle linear algebra on matrices with more than 231 elements (so regular --pca is no longer limited to ~46000 samples), as long as your system has enough memory.
  • KING-robust kinship coefficients (--make-king, --make-king-table, --king-cutoff). These remain accurate when good population allele frequency estimates are unavailable. We have found --king-cutoff to be much more reliable than the PLINK 1.9 --rel-cutoff flag for removal of close relations.
  • Proper support for dosages (decimal allele count expected values). When .gen/.bgen files are imported, hardcalls and dosages are saved to the .pgen. Operations which naturally extend to decimals (e.g. --pca, --glm, --freq, --maf/--mac) use the dosage information when it's present, while methods that can only make use of hardcalls (e.g. KING-robust, Hardy-Weinberg exact test) simply ignore the dosages. --hard-call-threshold can now be used to change the saved hardcalls without changing the dosages.
  • Much more multithreaded code.
  • Most commands let you control which columns appear in the main output file(s).
  • Broad support for both gzipped and Zstd-compressed text input files.
  • Graffelman and Weir's extended chrX Hardy-Weinberg exact test, which takes male allele frequencies into account. We've found that this tends to identify quite a few obviously miscalled chrX variants which were not caught by the usual QC filters.
  • Oxford-style haplotype filesets can now be imported and exported (--haps, "--export haps"/"--export hapslegend").
  • Sample-major PLINK binary files can now be efficiently exported ("--export ind-major-bed"). This is close to 3 orders of magnitude faster than the previous implementation (PLINK 1.07 --make-bed + --ind-major).

Coming next

  1. More multiallelic variant handling functions, and multiallelic dosage support.
  2. BCF2 import/export.
  3. Merge. (Once this is operational, a stable version of the .pgen specification will be provided, and PLINK 2.0 beta testing will begin.)

General usage >>