Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Credits

PLINK 1.9 is developed, tested, and documented primarily by Christopher Chang at GRAIL, Inc., Carson Chow and Shashaank Vattikuti at the NIH-NIDDK's Laboratory of Biological Modeling, Laurent Tellier at the BGI Cognitive Genomics Lab, and James Lee at the University of Minnesota, with additional funding from the Purcell Lab at Brigham & Women's Hospital.

  • All previous versions of PLINK are the work of Shaun Purcell at Brigham & Women's Hospital and Harvard University. Since our update started as an independent project, its level of compatibility with PLINK 1.07 would have been all but impossible to achieve if PLINK was not a free and open source program.
  • GCTA is the work of Jian Yang et al. at the University of Queensland. Their release of the GCTA 1.2 source code under GPLv3 terms is also greatly appreciated by us.
  • Thanks to Stephen Hsu at the BGI-CGL for motivating the initial weighted distance calculation.
  • Thanks to Sanja Franić at VU University Amsterdam for early testing.
  • Thanks to Mike Keehan for additional testing and a bugfix.
  • Thanks to Masahiro Kanai for improving the robustness of the VCF parser, fixing some other plink_data.c bugs, and adding some filtering flags.
  • The SSE2 population count algorithm used in many of PLINK 1.9's inner loops is based on work and discussion by Andrew Dalke, Robert Harley, Cédric Lauradoux, Terje Mathisen, and Kim Walisch.
  • The Hardy-Weinberg equilibrium and Fisher exact tests are based on an algorithm developed by Jan Wigginton and Gonçalo Abecasis at the University of Michigan Center for Statistical Genetics.
  • The Hardy-Weinberg equilibrium test 'midp' option was added due to work by Jan Graffelman and Victor Moreno.
  • The parallel gzip implementation was developed by Mark Adler at the Caltech/NASA Jet Propulsion Laboratory.
  • The BGZF library was developed by Bob Handsaker, Petr Danecek, Heng Li, and John Marshall.
  • PLINK 1.9's permutation procedures extend work by Brian Browning (PRESTO) and Roman Pahl (PERMORY).
  • PLINK 1.9's fast epistasis test implements methods developed by Xiang Wan et al. in BOOST and Masao Ueki, Heather Cordell, and Richard Howey in CASSI.
  • The logistic regression algorithm is based on the winning submission of Pascal Pons in the GWAS Speedup crowdsourcing contest run in April 2013 by Babbage Analytics & Innovation and TopCoder, who have donated the results to be used in PLINK 2. The contest was designed by Po-Ru Loh; subsequent analysis and code preparation were performed by Andrew Hill, Ragu Bharadwaj, and Scott Jelinsky. A manuscript is in preparation by these authors and Iain Kilty, Kevin Boudreau, Karim Lakhani and Eva Guinan.
  • Thanks to David Fischer for GitHub hygiene improvements.
  • Thanks to numerous PLINK 1.9 alpha testers for bug reports and helpful suggestions.

File format reference >>