D: 20 Mar 2017
Recent version history
Where're the other docs?
Output file list
PLINK 2.00 alpha
PLINK 2.0 alpha was developed by Christopher
Chang and the Human
Longevity, Inc. Data Science team, with substantial input from
Stanford's Department of
Biomedical Data Science. (More detailed credits.)
1: This build can still run on AMD processors, but it's statically linked to Intel MKL, so some linear algebra operations will be slow. We will try to provide an AMD Zen-optimized build as soon as supporting libraries are available.
Source code and build instructions are available on Github.
20 March 2017: Initial public release.
- Preservation of reference alleles (without requiring constant use of
--keep-allele-order), phase information, and the VCF QUAL, FILTER, and
INFO fields. Use --make-pgen instead of --make-bed when importing a VCF;
the fileset can then be referenced with --pfile. We will provide 1000
Genomes phase 3 downloads in the new fileset format as soon as
multiallelic variants are also supported.
- The new .pgen file format incorporates SNPack-style
genotype compression, frequently reducing file sizes by 80+% with
negligible computational cost. To allow users to take advantage of
genotype compression without sacrificing compatibility with scripts
expecting old-style .bim and .fam text files, PLINK 2.0 supports a hybrid
.pgen + .bim + .fam usage mode (--make-bpgen/--bpfile). We've also
provided a Python library for reading and writing .pgen files, to
simplify migration to the new format. (PLINK 1 .bed files are valid
.pgen files, so code written on top of the library is
- Firth regression ('--glm firth-fallback', '--glm firth'). Standard
logistic regression fails to converge, yielding 'NA' or nonsense results,
when the 2x2 allele/phenotype contingency table has an empty cell
("quasi-complete separation"); this is common, and especially likely
to happen with the strongest associations. Firth regression can
prevent you from missing these associations. The fast 'firth-fallback'
mode (only use Firth regression when there's either an empty contingency
table cell or regular-logistic-regression convergence failure) gets you
most of the benefit for a fraction of the computational cost.
- '--pca approx' (equivalent to EIGENSOFT 6 fastmode with default
parameters). If you have more than ten thousand samples, only need the
top principal components, and can tolerate ~0.1% error in the last PC,
this can save you a ton of compute time.
- The 64-bit Linux build can handle linear algebra on matrices with
more than 231 elements (so regular --pca is no longer limited
to ~46000 samples), as long as your system has enough memory.
- KING-robust kinship coefficients (--make-king, --make-king-table,
--king-cutoff). These remain accurate when good population allele
frequency estimates are unavailable. We have found --king-cutoff to be
much more reliable than the PLINK 1.9 --rel-cutoff flag for removal of
- Proper support for dosages (decimal allele count expected values).
When .gen/.bgen files are imported, hardcalls and dosages are
saved to the .pgen. Operations which naturally extend to decimals (e.g.
--pca, --glm, --freq, --maf/--mac) use the dosage information when it's
present, while methods that can only make use of hardcalls (e.g.
KING-robust, Hardy-Weinberg exact test) simply ignore the dosages.
--hard-call-threshold can now be used to change the saved hardcalls without changing the dosages.
- Much more multithreaded code.
- Most commands let you control which columns appear in the main output
file(s). For example, the help text for --make-king-table is
--make-king-table <zs> <counts> <cols=[column set descriptor]>
A "column set descriptor" is either a comma-separated sequence of column
set names (e.g. 'cols=id,nsnp,hethet,ibs0,ibs1,kinship' would add
HET1_HOM2 and HET2_HOM1 columns, while ensuring that SID columns do not
appear), or a comma-separated sequence of column set names where every
name is preceded by a plus or minus (in which case the column sets are
added/subtracted from the default, e.g. 'cols=+ibs1,-maybesid' is a
shorter way to add HET1_HOM2/HET2_HOM1 and exclude SID1/SID2).
Similar to --make-king, except results are reported in the original .kin0
text table format (with minor changes, e.g. row order is more friendly to
incremental addition of samples), and --king-table-filter can be used to
restrict the report to high kinship values.
Supported column sets are:
maybesid: SID1/SID2, if at least one value is nonmissing. Must be used
sid: Force SID1/SID2 even when all values are missing.
nsnp: Number of variants considered (autosomal, neither call missing).
hethet: Proportion/count of considered call pairs which are het-het.
ibs0: Proportion/count of considered call pairs which are opposite homs.
ibs1: HET1_HOM2 and HET2_HOM1 proportions/counts.
kinship: KING-robust between-family kinship estimator.
The default is id,maybesid,nsnp,hethet,ibs0,kinship. hethet/ibs0/ibs1
values are proportions unless the 'counts' modifier is present. If id is
omitted, a .kin0.id file is also written.
- What's SID, you ask? It's an optional third sample ID component
which can be used to distinguish samples from the same individual. (A
cautionary note: SID support isn't well-tested yet. But this should
- And what's 'zs'? That requests Zstandard compression of the
main output file. All PLINK 2.0 input text files are permitted to be
gzip- or Zstd-compressed. When working disk space is limited and you
still need to generate gzipped output, you can start with
Zstd-compressed output and then e.g. pipe the output of
--zst-decompress to pigz.
- Graffelman and Weir's extended chrX
Hardy-Weinberg exact test, which takes male allele frequencies into
account. We've found that this tends to identify quite a few obviously
miscalled chrX variants which were not caught by the usual QC
- Oxford-style haplotype filesets can now be imported and exported
(--haps, '--export haps'/'--export hapslegend').
- Sample-major PLINK binary files can now be efficiently exported
('--export ind-major-bed'). This is close to 3 orders of magnitude
faster than the previous implementation (PLINK 1.07 --make-bed +
- Linear regression.
- BGEN v1.2 and v1.3 import/export.
- Multiallelic variant support.
- BCF2 import/export.
- Merge. (Once this is operational, a stable version of the .pgen
specification will be provided, and PLINK 2.0 beta testing will
It should be available by early April. Meanwhile,
"plink2 --help [flag name]" should provide most of the information you
need; feel free to ask for further clarification in plink2-users.
Output file list