Introduction, downloads

S: 13 Jan 2017 (b3.45)

D: 13 Jan 2017

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf{.gz}, .bcf)

Oxford (.gen{.gz}, .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--{b}merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq{x}

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes phase 1

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

PLINK 1.90 beta

This is a comprehensive update to Shaun Purcell's PLINK command-line program, developed by Christopher Chang with support from the NIH-NIDDK's Laboratory of Biological Modeling, the Purcell Lab at Mount Sinai School of Medicine, and others. (What's new?) (Credits.) (Methods paper.)

Binary downloads

Build
Operating system1Stable (beta 3.45, 13 Jan)Development (13 Jan)  Old2 (v1.07)  
Linux 64-bitdownloaddownloaddownload
Linux 32-bitdownloaddownloaddownload
OS X (64-bit)downloaddownloaddownload
Windows 64-bitdownloaddownloaddownload
Windows 32-bitdownloaddownload

1: Solaris is no longer explicitly supported, but it should be able to run the Linux binaries.
2: These are just mirrors of the binaries posted at http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml.

Source code, compilation instructions, and the like are on the developer page.

The following documented PLINK 1.07 flags are not supported by 1.90 beta 3:

  • --qual-geno-scores3
  • --segment4
  • --dfam
  • --p2, --genedrop
  • --hap, --hap-window, --hap-snps5
  • --proxy-assoc, --proxy-impute5
  • --cnv-list, --cfile, --gfile
  • --R
  • --id-dict, --id-match6
  • --compress, --decompress7

Continue using PLINK 1.07 for most of these operations. However, be aware that

  • green flags are supported by the latest development build. You're more than welcome to try the new implementations; just make sure to check some of your results against PLINK 1.07 if you do so.
  • red flags will not be supported by the final 1.90 release. We recommend migrating away from PLINK 1 here, but if that's not practical, keep a permanent copy of the PLINK 1.07 binary around and modify the relevant lines of your scripts to explicitly refer to it.

3: We believe this now has almost no practical value, since the file format it expects is too different from VCF.
4: This was not fully developed in PLINK 1.07, and has been superseded by other IBD analysis packages.
5: PLINK's haplotype phasing and imputation algorithms are obsolete. Future PLINK versions will be able to import phase and dosage information emitted by other programs; the haplotype association commands will be reintroduced when that functionality is in place. Until then, BEAGLE 3.3 should be more accurate than PLINK for case/control haplotype association.
6: Free database software handles these operations in a more flexible and powerful manner.
7: Just use gzip/gunzip for this. Or better, pigz/unpigz.

Recent version history

13 January 2017: --ld-snps now works properly with other variant filters. --set-mixed-mt-missing flag added.

17 November 2016: chrX and chrY are no longer mishandled with --chr-set.

13 November: --geno/--missing now treat heterozygous haploid calls as nonmissing; this is consistent with --mind, and PLINK 1.07's behavior with --recode/--make-bed active. They're now inconsistent with what PLINK 1.07 did without --recode/--make-bed; use --make-bed + --set-hh-missing first if you want the old behavior. '--ci 0.95' no longer produces 'L94'/'U94' column names when plink gets unlucky with rounding. Minor --exclude/--extract/--flip bugfix for input files with no newline at the end. --qfam{-within} bugfix. --dosage 'sex' modifier now works properly when some samples have been filtered out (e.g. because some phenotypes were missing).

20 September: VCF export no longer erases an alt allele absent from the immediate dataset, due to a retroactive revision to the VCF specification. Fixed '--model gen' permutation testing bug.

10 September: --remove-cluster-names + --family bugfix. --indep-pairwise/--indep-pairphase consistency improvements (MAF comparison now takes floating point imprecision into account; a few pairwise comparisons are no longer improperly skipped when the window is kb-based).

16 August: --merge-x 'no-fail' modifier works properly again. '--hard-call-threshold random' bugfix for probabilities with less than 4-digit precision.

5 August: --silent Windows bugfix. --r/--r2 'd' and 'dprime-signed' modes added. --ld-window-cm flag added. --output-missing-genotype now works properly with --make-bed. --genome + --read-freq bugfix.

7 June: --23file X/Y/MT chromosome conversion bugfix. --pheno bugfix (if phenotype was quantitative, but first value was nonnumeric, it was treated as '0' instead of missing in recent builds).

16 May: Set test bugfix. '--R debug' bugfix.

31 March: '--split-x b38' now works properly. 32-bit set-test bugfix. --cluster + --within bugfix.

25 March: --recode beagle bugfix. --gene bugfix. '--vcf-half-call reference' mode added.

15 March: Fixed --update-alleles bug introduced in 24 Feb build.

13 March: Fixed .lgen loading bug introduced in 24 Feb build. Fixed use-after-free bug in extra chromosome name cleanup code. --allow-no-{samples,vars,covars} added to stable build.

24 February: Contig limit raised to ~65000. --annotate/--gene-report bugfix for 3-4 column case. --flip-scan bugfix.

3 February: Number-to-string encoding bugfix (occasionally affected numbers ≥ 106 saved in the .bim centimorgan column).

27 January 2016: Speed improvement for operating on a subset of samples. Fix minor --1 backward compatibility break.

24 December 2015: Fix --dosage + --extract/--exclude bug introduced in 4 November build. Minor --test-missing permutation bugfix. --vcf-min-qual bugfix for .bcf files.

16 December: '--meta-analysis + study qt' now reports regression betas instead of odds ratios in the study-specific columns. '--score header' no longer forgets to append .nopred to the problem list filename.

13 December: --allow-no-samples and --allow-no-vars flags added. --dosage now suppresses regression results for very-low-MAF variants in the same same manner as it does in PLINK 1.07. --lgen flag now supported. --gxe works properly again (it was inadvertently disabled a few months ago). --hardy now produces 'nan' results for chrY/chrM variants (like PLINK 1.07) instead of entirely omitting them. --hwe 'observation counts' warning is no longer triggered by chrY variants, and is now more informative when only triggered by chrX variants.

26 November: --logistic can now report intercepts. --logistic adaptive permutation bugfix. If you used adaptive permutation with --logistic in the past, we recommend that you redo the run with the latest build.

22 November: --indep{-pairwise,-pairphase} kb-based windows now work properly with sample/variant filters. Several --indep-pairphase bugfixes. --all-pheno no longer causes exit code 127 to be returned on successful runs. --recode-allele segfault bugfix. --meta-analysis now only considers the first appearance when a variant appears multiple times in the same file. --no-const-covar and --meta-analysis-report-dups flags added.

4 November: Fixed '--r{2} square0' bug which occasionally caused a line break to be missing in the middle of the output file. Fixed --genome + --parallel missing-data handling bug.

17 October: Set-handling bugfixes. --{fast-}epistasis + variant filtering bugfix. --score 'double-dosage' modifier added.

3 September: --{b}merge no longer crashes when two sample IDs are mostly identical but have different capitalization and different parental IDs/phenotypes. Errors and warnings are now printed to stderr instead of stdout (yes, this is overdue). --dfam implemented. --vcf now accepts '*' deletion-overlap alt allele codes (thanks to John Wallace).

15 July: --freq case-control mode added, and analogous --dosage case-control-freqs modifier added. Fixed --homozyg .summary bug that occurred when variant filters were present. Revised heterozygous haploid warning to be clearer. --mendel 'summaries-only' modifier added. --R missing value bugfixes. --dosage 'skip2' now works properly.

29 June: --attrib{-indiv} now handles multiple negative match conditions in the same manner as PLINK 1.07. --clump + variant filter bugfix. --lasso is now memory-efficient.

25 June: Fixed --snps bug that could arise when two named variants had positions differing by exactly 1.

17 June: X chromosome Mendelian error checking bugfixes. '--dosage Zout' bugfix.

13 June: --mds-plot switched from eigendecomposition-based algorithm back to SVD, and the matrix diagonal is now properly double-centered. (Update, 15 June: you can now use the --mds-plot 'eigendecomp' modifier in the development build to request the eigendecomposition algorithm, with the centering bug fixed. It does have the virtue of being several times faster than SVD.)

29 May: --vcf now won't error out on GATK 3.4 symbolic deletion alleles. --recode 'gen-gz' modifier (for gzipping of Oxford-format .gen output) added. --pca header bugfix. --meta-analysis long allele code bugfix.

20 May: --make-grm-bin + --parallel is now permitted by the command-line parser. Fixed bug in 11 May build which broke parsing of --gen + --sample.

11 May: Fix --meta-analysis bug introduced in 18 April build. "Options in effect:" printed to standard output again, by popular demand.

3 May: Merge + chromosome filter bugfix.

18 April: --linear + --tests bugfix. --dosage + --exclude bugfix.

9 April: --fst now works properly with variant filters.

30 March: Fixed recent --dosage association report bug which caused null characters to appear on 'NA' lines. --linear/--logistic sex modifier + --parameters bugfix. (Update, 2 April: Linux binaries should no longer fail with "kernel too old" on RHEL 6 and similar systems.)

18 March: --meta-analysis no-map bugfix, QFAM sibship handling bugfix.

12 March: --mendel + --mendel-duos bugfix, --linear/--logistic + --adjust works properly again (--adjust was reporting "zero valid tests" when nothing was actually wrong). Missing genotype calls at otherwise monomorphic loci in multichar-allele .ped files are now converted properly (they were previously encoded as a pair of '0' alleles, instead of a missing call). QFAM test closed for repairs. --distance-wts partially added to development build (GRM variant weighting coming soon).

5 March: --fast-epistasis 32-bit missing data handling bugfix. --thin-indiv and --thin-indiv-count added to development build (courtesy of Masahiro Kanai).

2 March: VCF 'PR' header line is no longer malformed. (Existing malformed VCFs generated by January-February builds can be fixed by adding a '>' at the end of that line.) Fixed a set-handling bug that could affect sets containing the dataset's last variant.

26 February: '--recode vcf' header modified for compatibility with VCFtools. Compact 1000 Genomes phase 1 data files posted on the resources page.

10 February: Low-memory --r/--r2 matrix output bugfix. --recode vcf can now generate bgzipped output. More gzip/bgzip multithreading.

2 February: --bcf now handles BCFv2.2 nonzero missing genotype and end-of-vector values (emitted by e.g. bcftools 1.1) properly. --mendel-multigen is no longer blocked by the command-line parser when a family-based association test is run. Oxford-format import now tolerates identical A1/A2 allele codes. --missing, --freq, --hardy, and --het now support gzipped output (add the 'gz' modifier). --R added to development build.

15 January: --keep/--remove now works properly on newly-updated IDs when --update-ids is in the same run. --mac/--max-mac added to development build.

12 January: --dosage + --covar + --sex now works properly, and an inaccurate warning about the PLINK 1.07 implementation has been removed. --dosage logistic regression no longer reports improper p-values for very small samples.

11 January 2015 (beta 3): --epistasis implemented, with heavy optimization for quantitative traits. '--recode vcf{-fid/-iid}' now flags reference alleles as "possibly not based on a real reference genome" unless --real-ref-alleles is also specified, and sets ALT alleles to '.' when they are not present in the immediate dataset. --vcf-min-gq and --vcf-min-gp no longer error out when a genotype entry has fewer fields than expected, since the VCF specification explicitly states that this is an acceptable way to represent trailing missing values. --make-bed no longer segfaults when resorting a file too large to fit in memory.

20 December 2014: --dummy, --show-tags, --neighbour, --mh/--bd/--mh2/--homog and --clump-field bugfixes. --q-score-range implemented for dosage data. --split-x/--merge-x 'no-fail' modifier added to support data conversion scripts.

13 December: Case/control --hardy/--hwe no longer randomly excludes too many controls from control-only stage of test. (This was mostly harmless, since the all-samples test still worked, and the bug was only likely to occur when --hardy/--hwe was in the same run as other filters like --exclude/--extract. But if you used the UNAFF rows in the --hardy report for anything, you should rerun --hardy with the latest build.) set-test multiple-testing correction now counts nonempty sets with zero significant variants.

11 December: --dosage format=3 + --score bugfix. 'beagle-nomap' option added to --recode. --list-duplicate-vars flag added to development build.

7 December: --adjust now applies to the joint test statistic instead of the additive effect when the former is computed, and no longer reports linearly genomic-controlled p-values when they would be invalid. Some numeric stability improvements for small p-values. --logistic permutation test and --assoc/--model set-test bugfixes. --linear/--logistic set-test and --dosage 'sex' modifier implemented. --dosage + --sex error message now explains that PLINK 1.07 did not handle this flag combination properly.

25 November: Fixed --exclude/--extract memory management bug in 2 November build. --score now supports dosage data. --write-dosage no longer writes incorrect IDs when sample filters are applied. --dosage 'noheader' can now be used with 'list' when each batch has only one file. --assoc/--model set-test implemented. --set-missing-var-ids extended to permit inclusion of allele names, and --set-missing-snp-ids/--set-missing-nonsnp-ids retired. UNADJ column values are now correct in '--adjust gc' reports.

2 November: Improved variant ID lookup speed. Single-precision binary matrix output (e.g. --make-grm-bin) is now based on double-precision internal computations, and the 'bin' + 'single-prec' modifier combination has been replaced with 'bin4'. Binary --distance output bugfix.

15 October: Fixed a bug in --vcf's handling of variants with 10 or more alternate alleles. --dosage Zout no longer segfaults at the end. Merger no longer scrambles centimorgan coordinates.

26 September: Fixed --nonfounders bug which broke X chromosome MAF computation. --vcf-min-gp now tolerates '?' GP values.

20 September: Fixed non-strict --biallelic-only bug when handling multiallelic variants. If you have used --biallelic-only without 'strict' on VCF files with triallelic variants, we strongly recommend rerunning the operation with the latest build. Fixed quantitative trait --assoc bug that caused it to write each output line twice, and --linear/--logistic mishandling of some datasets with heterozygous haploid calls. --vcf-min-gp added to development build.

18 September: --no-fid bugfix. Presence of nonnumeric phenotype strings (e.g. 'NA') no longer force the phenotype to be treated as quantitative. --output-missing-phenotype now accepts nonnumeric strings. '--dosage list' now works properly with multiple batches. --oxford-single-chr flag added to allow loading of single-chromosome .gen files with ignorable SNP ID field values. --genome no longer fails to report some parent-offspring relationships. --update-alleles was still missing a few matches involving missing genotypes; this has been fixed.

8 September: --write-dosage bugfix for 2730+ samples. --{b}merge/--merge-list now usually errors out when combined with a filter flag that wouldn't take effect. --meta-analysis now supports weighted Z-score-based analysis.

2 September: --dosage 'noheader' should now work properly when some phenotypes are missing. --vcf + --vcf-filter now parses semicolon-delimited FILTER fields correctly. --vcf-min-gq flag added to development build. --exclude/--extract no longer have terrible performance when both the main dataset and the --exclude/--extract variant list have millions of '.' entries.

28 August: --recode HV{-1chr} now always uses '0' as the missing phenotype code, since Haploview does not accept -9. --write-covar + --with-phenotype no longer segfaults on case/control phenotypes, and multichar allele .ped loader no longer segfaults on nonstandard tri/quadallelic variants. --meta-analysis added to development build.

17 August: Fixed a recent library function bug which broke --filter, --within, and a few other flags.

14 August: Fixed a typo in the 11 August build which caused covariate loading and a few other functions to hang.

11 August: --linear/--logistic no longer uses a buggy Huber-White standard error estimator when clusters are defined. --output-chr now works properly with --make-bed when the input .bim is unsorted, and a very long allele code in an unsorted .bim no longer causes --make-bed to segfault.

1 August: --dosage logistic regression bugfix. --make-just-bim and --make-just-fam flags added. .bim/.fam files can now be processed without an accompanying .bed under some circumstances. --recode-allele now works properly with A-transpose mode.

18 July: Text filesets with both multi-character allele codes and an unsorted .map file no longer cause the autoconverter to crash. Malformed files generated by some old merges no longer cause segfaults. --pfilter should now consistently filter out 'NA' entries. --dosage chromosome code output bug fixed. --read-freq now loads A1 allele codes when they're missing from the main dataset, instead of erroring out in that situation. --show-tags added to development build.

4 July: Fixed two merge bugs which potentially caused data in the last few samples of an input or merged fileset to be mishandled. If you have merged filesets generated with earlier PLINK 1.90 alpha/beta builds, and more than two samples were involved, we suggest redoing the merge with a more recent build. (Most merges were unaffected, but better safe than sorry.)

3 July (beta 2): .ped file parser now properly handles --missing-genotype flag. --mh2 implemented.

1 July: Fix recently introduced (sorry about that) --data/--gen/--sample command-line parsing bug. --indep no longer misreports the number of pruned variants when there is extensive multicollinearity. --fst, --homog, and --oxford-pheno-name flags added.

27 June: '--recode oxford' no longer dumps incorrect IDs when used in the same run as a sample filter. --fill-missing-a2 flag added. --recode 'A-transpose' and 'include-alt' modifiers added. '--het small-sample' mode added. Basic Cochran-Mantel-Haenszel and Breslow-Day tests added to development build.

20 June: Fixed --mendel zero chromosome code/segfault bug. --merge-list no longer requires a reference fileset. --fast-epistasis + --parallel bugfix. --indep-pairphase and QFAM test completed in development build.

10 June: --merge-mode 1 now correctly merges missing calls with a single nonmissing call. --r/--r2 chromosome boundary handling bugfixes. '0X'/'0Y'/'0M' chromosome codes emitted by Oxford tools are now recognized, and also supported by --output-chr. --vcf-half-call flag added to govern handling of '0/.' VCF GT values, and default behavior is now 'error' mode to force a conscious decision. --dosage completed in development build.

5 June: --ld-snp-list and multipass --r/--r2 bugfixes. Nonstandard '0/.' and similar VCF GT field values are now processed as if they did not have the trailing '/.', instead of causing a segfault. (Handling of this case may be configurable in the future, stay tuned.) --linear/--logistic permutation bugfix (permutation success count array was not initialized to all-zero). --dosage linear regression added to development build.

3 June: Gzipper no longer deletes the output file when being asked to append (this was causing big n-pass calculations such as --r2 gz to only keep the output of the last pass). Fixed --genome double-missing-call handling bug, and a --r/--r2 mixed autosomal/nonautosomal data handling bug. '--dosage occur' and --write-dosage added to development build.

28 May: Fixed a few nonstandard chromosome name-related segfaults. '--r2 dprime' missing data handling bugfix. Corrected misnamed --filter-attrib{-indiv} flags to --attrib{-indiv}, fixed a positive matching bug, and added support for gzipped attribute files. '--recode A{D} tab' no longer emits spaces in the header line. --blocks now has a 'no-pheno-req' modifier which removes the unnecessary phenotype requirement. --annotate added to development build.

24 May: Sample-major to variant-major .bed transposition bugfix. Merger now provides an informative error message when given an sample-major .bed file, and does not log equal-position warnings when multiple variants have bp coordinate 0 (since that's often used to indicate that the variant is unlocalized). --make-bed no longer crashes on tiny nonzero centimorgan coordinates. Contig limit raised to ~5000, to support draft mosquito genomes. --gene-report added to development build.

20 May: 32-bit X/Ychr MAF calculation bugfix. --qual-scores added to development build.

13 May: --logistic + sex covariate bugfix. --snp without --window no longer behaves like --snp + '--window 0' when other variants share the same bp coordinate. --pca now errors out instead of returning all-zero eigenvalues/vectors when samples with no genotype data are present.

11 May (beta 1): Fast third-party --logistic code integrated (see credits page for details). Logging now permits gPLINK to consistently detect output files. --output-chr added. --family and --make-perm-pheno implemented. Within-cluster permutation bugfix. --linear/--logistic dominant/recessive models and covariate interactions work now. --tests bugfix.

2 May: --clump-verbose no longer reports negative r2 values when phase flips.

1 May: --clump-verbose + --clump-range bugfix.

26 April: Fixed merge bug in 21 and 25 April builds.

25 April: Old sample-major PLINK binary files are now detected correctly. Malformed input error messages now include line numbers.

15 April: --bcf now adds 1 to variant coordinates, since its coordinates are defined to be 0-based while VCF is 1-based.

14 April: --check-sex/--impute-sex no longer silently considers nonmissing Y genotype counts by default. --bcf now treats missing variant IDs as if they were equal to '.', instead of erroring out. Basic parent-of-origin test implemented.

8 April: Centimorgan position loading bugfix. --mendel error description formatting bugfix. --lambda bugfix. --adjust now prints and logs estimated genomic control lambda value. --keep{-fam}/--remove{-fam} input files with duplicate IDs now just trigger a warning instead of an error. --check-sex/--impute-sex now has a 'y-only' mode.

5 April: --not-chr bugfix. '--recode fastphase' no longer defaults to 0/1 allele codes (though they can still be requested with e.g. '--recode 01 fastphase'). --hardy2/--hwe2 now invoke the mid-p adjusted versions of --hardy and --hwe, to reflect the original chi-square test's lack of conservative bias.

4 April: Fixed --hardy bug that caused chromosome names and marker IDs to sometimes be merged on case/control data. --check-sex/--impute-sex now use Y chromosome data when it's present. --flip-subset, --flip-scan, and basic exact binomial test --tdt implemented.

28 March: Fixed --bcf bug that caused it to fail whenever there were multiple FORMAT fields. --score missing phenotype output bugfix.

27 March: Fixed --bcf header line parsing bug (loading should no longer fail when the GT header line appears after a non-PASS INFO or FILTER line, and --vcf-filter should now work with BCF2 files). --split-x 'hg20' build code corrected to 'hg38'. --ibc Fhat2/Fhat3 bugfix. --het and --set-me-missing implemented.

26 March: --hardy/--hwe X chromosome case/control bugfix. --extract/--exclude now considers every token in a file, instead of just the first on each line (this was undocumented PLINK 1.07 behavior).

25 March: --bcf no longer fails to load newer bcftools-generated files with 'IDX=' toward the end of the GT meta-information line. File import shortcuts (--vcf + --out without --make-bed, etc.) now error out when a filter flag that wouldn't take effect (e.g. --extract, --hwe, --snps-only...) is specified. --me and --mendel implemented; --mendel-duos and --mendel-multigen flags added to extend their functionality. Fixed PLINK 1.07 --mendel issue where genotypes would be set to missing before scanning was complete (i.e. if there were overlapping trios, PLINK 1.07 could fail to report some errors).

22 March: --ld-snp-list long file bugfix, PLINK 1.07 --score Y chromosome handling bugfix.

20 March: --condition-list command line parsing bugfix, --recode beagle bugfix, --make-founders bugfix, sample filtering bugfix for --regress-distance and --recode lgen/list/rlist. Basic --score implemention. Ueki/Cordell joint-effects test now skips marker pairs with less than 5 observations in any contingency table cell (where cases and controls are considered separately); this threshold is adjustable with --je-cellmin.

17 March: --vcf-idspace-to flag added to improve handling of VCF/BCF2 sample ID spaces. --blocks is now more customizable.

15 March: Multiple solutions to the haplotype frequency cubic equation (which arises when evaluating Lewontin's D-prime) should now always be handled correctly; there were a few corner cases which were mishandled before. Markers with identical bp coordinates no longer cause 1.9 --blocks to yield different results than 1.07 --blocks.

14 March: --blocks implemented. D-prime computations ('--r2 dprime', --ld, --blocks, --clump) involving variants on the X chromosome now appropriately downweight males relative to females. --vcf and --bcf now handle sample ID spaces in a reasonable manner.

7 March: Fixed minor --genome bug that clipped Z2 estimate to 0 instead of 1 when it was too large. --all-pheno and --loop-assoc now print case/control counts for each phenotype.

6 March: Fixed --make-bed bug that threw away major allele codes of monomorphic loci when the markers were unsorted and no minor allele code was present.

28 February: --a1-allele/--a2-allele 'fix' in 26 February build was backwards; this is no longer the case. If you used --a1-allele/--a2-allele from that build on a dataset with monomorphic loci and missing allele codes, you should download the latest build and rerun your pipeline from that point forward; sorry about the mistake. (We hope we were the only actual victims of this.) VCF importer now supports variants with 10+ alternate alleles.

27 February: VCF generator no longer segfaults sometimes on the X chromosome. VCF allele code Nazi now just issues a warning, since some pipelines actually depend on violating the official spec.

26 February: --a2-allele succeeds instead of giving 'Impossible allele assignment' warnings when the A1 allele code is unset, and vice versa for --a1-allele. Variant IDs in --a1-allele/--a2-allele 'Impossible allele assignment' warnings are no longer strangely truncated. VCF 'N' reference allele now handled in a saner manner (converted to and back from missing).

25 February: --ci no longer prints incorrect confidence intervals with --linear/--logistic. --linear 'intercept' modifier and --thin-count added.

24 February: Fixed .tped loading bug when original file was not fully sorted. VCF generator now forces the A2 (reference) allele to always be known, and outputs '.' instead of '0' when the A1 (alternate) allele is unknown; VCF allele codes are also forced to either only contain characters in {A,C,G,T,N,a,c,g,t,n} or start with '<'.

20 February: --r/--r2 bugfixes.

19 February: Fixed errors that occurred when using disallowed (via --chr-set, --dog, etc.) X/Y/XY/MT chromosome codes with --allow-extra-chr. --dog now permits mitochondrial data.

15 February: Fixed --hardy segfault on datasets with no controls. --recode-allele now works properly (it was changing the header line without flipping allele counts before). --clump-best implemented.

14 February: Fixed '--genome full' IBS0 column printing bug in 11 February build. --set-missing-{snp/nonsnp/var}-ids flags changed to use @ instead of ^ to mark the chromosome code's position, since ^ is a reserved Windows shell character. --cm-map flag changed to also use @ for the chromosome code. LD-based result clumping (--clump) is now supported. --r/--r2 'in-phase' modifier.

11 February: Fixed --geno bug introduced in 23 January build (missing-phenotype samples were being partially thrown out for no reason). --set-missing-snp-ids and --set-missing-nonsnp-ids flags introduced, to handle the case of overlapping SNPs and indels being defined as separate variants at the same coordinate. --cow now permits mitochondrial data. Distance/relationship matrix calculations no longer waste a huge amount of time on thread creation and destruction when hundreds of cores are present.

8 February: Thread limit temporarily decreased to 23, since higher numbers result in too much thread creation/destruction overhead for now. (This multithread efficiency issue will be solved in the near future.) Also see the major bugfix above.

7 February: --genome + --parallel bugfix. --update-parents was broken in the 6 February build; it should now be fixed, and now permits very long lines (so it's usable on .ped files).

6 February: IDs and pedigree information are no longer incorrect when --genome is used with sample filtering flags. --extract/--exclude now support set range files (--range). Optional 'chr' chromosome prefixes may now be partly or entirely capitalized. --pca var-wts modifier, --pca-cluster-names/--pca-clusters projections. Gzipped and binary Oxford genotype files can now be directly imported. --update-sex now takes a column parameter (so it can be pointed directly at .ped/.fam files now).

30 January: Fix --snps bug introduced in 23 Jan build. --pca, --set-missing-var-ids, --test-mishap. --aec is now acceptable shorthand for --allow-extra-chr. --rel-cutoff moved earlier in order of operations (so e.g. --rel-cutoff --make-bed works).

26 January: --file now handles nonstandard chromosome names in .map files properly. --make-grm-bin + --parallel output filename fix. --test-missing max(T) permutation test.

24 January: --covar-name fixed, hopefully for good. Fixed contig name handling bug which slipped into 23 Jan builds. Fisher mid-p adjustment + permutation test bugfix. --test-missing adaptive permutation test.

23 January: Fixed a bug with --missing + sample filters, as well a minor PLINK 1.07 --missing + --within bug. Contig limit raised to ~2500. --zero-cluster, --oblig-missing. --geno and --mind should now treat sex chromosomes the same way as PLINK 1.07.

16 January: Oxford-format loader now accepts tab-delimited text files, because apparently that is a thing. --hwe interface improved (inappropriate default p-value removed, and a warning is now printed whenever observation counts vary by more than 10%).

14 January: Fixed .bim sorting bug when some loci were simultaneously being filtered out. --check-sex and --impute-sex implemented. Hardy-Weinberg and Fisher's exact tests now support mid-p adjustments. Proper handling of ambiguous sex codes. Y chromosome 'nonmissing nonmale genotypes' warning no longer gets sexes backwards (oops). Mitochondrial DNA no longer required to be haploid (though there are no plans to support full polyploidy). --split-x and --merge-x flags added to simplify handling of X chromosome pseudo-autosomal region.

9 January: --all-pheno now includes phenotype IDs in output filenames when possible (instead of 'P1', 'P2', etc.). --hardy + variant filter bugfix. --linear/--logistic covariate handling bugfix. Support for SHAPEIT recombination maps added (--cm-map).

7 January 2014: --indep{-pairwise} speed improvement in no-missing-call case. Cluster permutation and --covar-name range handling bugfixes. --snps-only filter added. Linux/OS X thread limit raised to 1023.

23 December 2013: Oxford-format loader no longer requires 'missing' to be lowercase. Basic --test-missing. Fixed a bug which sometimes came up when using --ibs-test or association analysis commands while filtering out samples.

21 December: Covariate loader bugfix.

20 December: --condition-list bugfix, --bcf + --vcf-filter bugfix. --make-rel/--make-grm-gz/--make-grm-bin/--ibc work properly again when input file has some major alleles in the A1 position. (Also fixed the --distance segfault introduced in the 18 Dec build; sorry about that.)

11 December: Fixed loading of .map files missing a centimorgan column. --fast-epistasis set-by-set works properly with two sets now.

9 December: --r/--r2 dprime bugfixes, --fast-epistasis uninitialized N_SIG bugfix. Basic variant set file I/O, --fast-epistasis set-based tests.

5 December: --make-bed position sort and Oxford-format loading bugfixes. 32-bit --r/--r2/--fast-epistasis bugfix. --fast-epistasis now supports extended version of BOOST test (missing data permitted, df properly adjusted in the face of e.g. zero homozygous minor observations). --r/--r2/--ld finished. --gplink flag supported.

26 November: --filter-nonfounders segfault fix.

25 November: --biallelic-only segfault fix when 'list' modifier was not specified. BCF2 (either uncompressed or BGZF-compressed) and Oxford-formatted data can now be directly imported. --fast-epistasis now supports the Ueki-Cordell joint effects test, and fills the 3x3 contingency tables more quickly when no missing markers are present (increasing the speedup factor to ~60C in that case). --fast-epistasis + --parallel works.

17 November: --r/--r2 bugfix, --fast-epistasis, --recode oxford. (The --fast-epistasis implementation is roughly 40C times as fast as PLINK 1.07, where C is the number of processor cores, and it also employs a more accurate variance estimator.) A bit of dead wood trimmed to make way for better implementations (--regress-pcs, dosage distance calculator); let us know if you want those functions to return sooner rather than later.

12 November: Basic VCF text loader. Add check for matching input and output filenames when using --rel-cutoff in batch mode.

10 November: --r/--r2 square matrices. Fixed --keep/--remove/--extract/--exclude bug introduced in a recent build. --genome IBD sharing calculation bugfix. --indep{-pairwise} results should no longer be slightly discordant with PLINK 1.07 when missing data is present (standard deviations were previously calculated once per site, they're now recalculated for every pair).

8 November: --indep{-pairwise} corner case bugfix. --filter-attrib{-indiv}.

3 November: .ped loader should no longer run out of memory if a single very long allele is present.

1 November: The main loading sequence and most functions should now handle very long allele names. .ped multi-character allele loading and --a1-allele/--a2-allele bugfixes.

28 October: Fixed some merge and multichar allele handling bugs. --filter can now match against more than one value. Cluster membership filters added (--keep-clusters, --keep-cluster-names, --remove-clusters, --remove-cluster-names).

19 October: --lasso now supports covariates.

17 October: --pheno missing quantitative phenotype bugfix, minor --lasso bugfix.

16 October: Basic LASSO implementation (--lasso). --linear/--logistic X chromosome bugfix.

14 October: --missing. Various --linear/--logistic bugfixes. --linear/--logistic permutation tests now correctly announce when they are employing multiple threads.

13 October: --linear/--logistic. Nonautosomal chromosome filtering bugfix.


What's new?

Unprecedented speed
Thanks to heavy use of bitwise operators, sequential memory access patterns, multithreading, and higher-level algorithmic improvements, PLINK 1.9 is much, much faster than PLINK 1.07 and other popular software. Several of the most demanding jobs, including identity-by-state matrix computation, distance-based clustering, LD-based pruning, haplotype block identification, and association analysis max(T) permutation tests, now complete hundreds or even thousands of times as quickly, and even the most trivial operations tend to be 5-10x faster due to I/O improvements.

We hasten to add that the vast majority of ideas contributing to PLINK 1.9's performance were developed elsewhere; in several cases, we have simply ported little-known but outstanding implementations without significant further revision (even while possibly uglifying them beyond recognition; sorry about that, Roman...). See the credits page for a partial list of people to thank. On a related note, if you are aware of an implementation of a PLINK command which is substantially better what we currently do, let us know; we'll be happy to switch to their algorithm and give them credit in our documentation and papers.

Nearly unlimited scale
The main genomic data matrix no longer has to fit in RAM, so bleeding-edge datasets containing millions of variant calls from exome- or whole-genome sequencing of tens of thousands of samples can be processed on ordinary desktops (and this processing will usually complete in a reasonable amount of time). In addition, several key sample x sample and variant x variant matrix computations (including the GRM mentioned below) can be cleanly split across computing clusters (or serially handled in manageable chunks by a single computer).

Command-line interface improvements
We've standardized how the command-line parser works, migrated from the original 'everything is a flag' design toward a more organized flags + modifiers approach (while retaining backwards compatibility), and added a thorough command-line help facility.

Additional functions
In 2009, GCTA didn't exist. Today, there is an important and growing ecosystem of tools supporting the use of genetic relationship matrices in mixed model association analysis and other calculations; our contributions are a fast, multithreaded, memory-efficient --make-grm-gz/--make-grm-bin implementation which runs on OS X and Windows as well as Linux, and a closer-to-optimal --rel-cutoff pruner.

There are other additions here and there, such as cluster-based filters which might make a few population geneticists' lives easier, and a coordinate-descent LASSO. New functions are not a top priority for now (reaching 95%+ backward compatibility, and supporting dosage/phased/triallelic data, are more important...), but we're willing to take time off from just working on the program core if you ask nicely.

Future development

  • Most remaining PLINK 1.07 flags which have not been rendered obsolete by more accurate free software.
  • Better options for managing multiple samples associated with the same individual.
  • A new core binary file format supporting phased data, limited precision dosage data, and more than two alleles per locus, with automatic conversion to and from VCF/BCF2 and similar formats. Note that we do not currently plan to support VCF in full generality; there are nontrivial decisions to be made about what to keep and what to leave out. If you have opinions on the design of this format, you are encouraged to email us or comment in the Google group.
  • Calculations which account for the additional information in the new file format.
  • We will eventually look into GPU acceleration of the most commonly used slow operations.

Limitations

PLINK's primary job is management and analysis of position-based SNP-like data for thousands of samples, and it is optimized for this setting. Here are a few things PLINK will probably never be able to do, since they are serious jobs best handled with fundamentally different data structures than the ones PLINK is built around.

  • General analysis of structural variation. There are common subcases, such as small indels, which can sometimes be treated like SNPs, and PLINK 1.07 also had a small specialized CNV analysis module which we aren't dropping. But modern whole-exome and whole-genome sequencing technologies are capable of detecting exotic deviations from reference which are neither SNP- nor CNV-like, and these deviations can be clinically relevant. You should use a more flexible platform, such as PLINK/SEQ (which was explicitly designed to handle corner cases beyond PLINK's reach, and also has good metadata handling facilities), to investigate these.
  • Anything relating to raw reads. As mentioned above, we plan to add support for probabilistic genotype calls in 2015-16, but you will still need to use other program(s) to generate those calls from SAM/BAM/etc. files.
  • Directly expose a graphical user interface. Use a program like gPLINK for this. (If you wish to update gPLINK, we'd of course be happy to support your efforts.)
  • Handle read-only queries, especially on a small subset of the samples, at near-optimal speed. The PLINK 1 binary file format is a simple, compact rectangular matrix. Aside from the unavoidable choice of major dimension (the format is "variant-major" and sorted by genomic position, so operations on small genomic regions are especially efficient, while operations on small sample subsets don't get much of a speedup), the format is workflow-agnostic; read-only operations are relatively fast, and writing a new fileset is also relatively fast. If you are done with data filtering/merging/etc. and will just perform read-only operations in the future, you can reorganize your data in a way which is slow to write but allows some queries to be even faster. This is the main idea behind Ryan Layer's GQT software; its use of a "sample-major" data representation and a different (MAF-based) variant order make it especially complementary to PLINK. If you'll be performing queries on genomic regions, you may want to look at Heng Li's BGT.

Note to testers

We are primarily interested in the following three types of feedback during the current testing phase:

  1. Bug reports, obviously. Unexpected and unwanted incompatibilities with PLINK 1.07's interface count as bugs. (When making a bug report, plase include a supporting .log file.)
  2. Remarks on our documentation. We'll try to rewrite/redesign any parts that are confusing.
  3. New feature requests. This can range from little bits of interface cleanup, to full calculations which cannot currently be performed at satisfactory speed/scale with existing software.

Comments should generally be made in the plink2-users Google group.

General usage >>