Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

General usage

Getting started

After downloading and unzipping PLINK 1.9, you should see the main PLINK 1.9 binary, the GPLv3 license, the prettify utility for generating clean space-delimited text tables, and the small files toy.ped and toy.map. Try the command

./plink --file toy --freq --out toy_analysis

You should see something like:

PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to toy_analysis.log.
Options in effect:
  --file toy
  --freq
  --out toy_analysis

4096 MB RAM detected; reserving 2048 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (2 variants, 2 people).
--file: toy_analysis-temporary.bed + toy_analysis-temporary.bim +
toy_analysis-temporary.fam written.
2 variants loaded from .bim file.
2 people (2 males, 0 females) loaded from .fam.
2 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Calculating allele frequencies... done.
Total genotyping rate is 0.75.
--freq: Allele frequencies (founders only) written to toy_analysis.frq .

(If it fails, you might have downloaded the wrong package for your machine. Double-check the Downloads table; if you're still stumped, our plink2-users Google group may help.)

Okay, what did my command mean? And what just happened?

PLINK 1.9 parses each command line as a collection of flags (each of which starts with two dashes1), plus parameters (which immediately follow a flag, and never start with a dash unless that dash is immediately followed by a digit) for those flags. So the command above included three flags: --file, --freq, and --out. They specify the following three things, which are part of almost every PLINK run:

  • Input data: "--file toy" tells PLINK to use the genomic data in the text files toy.ped and toy.map. You'll see several other ways to specify input data on the next page.
  • Calculation(s)2 to perform: --freq tells PLINK to generate an allele frequency report. The full range of supported calculations is summarized under "Main functions" in the sidebar, and the formats of all reports are described in the file formats appendix.
  • An output file prefix: We'll elaborate on this in a moment.

So this particular combination makes PLINK calculate allele frequencies in toy.ped + toy.map, and write a report to toy_analysis.frq.

If you have PLINK 1.07 installed, try running the same command with it: you should get exactly the same report, down to the last byte. We are aiming for this level of concordance across almost all PLINK 1.07 commands where it might be wanted.

1: Actually, that was a lie. With the exceptions of --1 and --23file, PLINK 1.9 allows you to use a single dash in front of each flag. In exchange for saving you some keystrokes, please do yourself a favor and avoid filenames that begin with a dash.
2: PLINK 1.9 is usually less strict than PLINK 1.07 when it comes to allowing multiple calculations in a single run. See the order of operations page for details.

Interpreting our flag usage summaries

The rest of this documentation has many one-line summaries describing the parameter sets accepted by particular flags, followed by discussions of flag functionality and the effects of optional parameters. We use the following conventions in our one-line usage summaries (these were adjusted in March 2019 to be more consistent with community norms):

  • <angle brackets> denote a required parameter, where the text between the brackets describes its nature.
  • ['square brackets + single-quotes'] denotes an optional modifier. Use the EXACT text in the quotes; e.g. "--freq gz" is valid given the summary

--freq [{counts | case-control}] ['gz']

  • [{bar|separated|braced|bracketed|values}] denotes a collection of mutually exclusive optional modifiers (again, the exact text must be used). When there are no outer square brackets, one of the choices must be selected.
  • ['quoted_text='<description of value>] denotes an optional modifier that must begin with the quoted text, and be followed by a value with no whitespace in between. '|' may also be used here to indicate mutually exclusive options. E.g. "--assoc perm" and "--assoc mperm=10000" are both valid, and "--glm perm mperm=10000" invalid, given the summary

--assoc ['perm' | 'mperm='<value>] ...

  • [square brackets without quotes or braces] denote an optional parameter, where the text between the brackets describes its nature.
  • An ellipsis (...) indicates that you can enter multiple parameters of the specified type.
  • Background color summarizes degree of similarity to previously existing functionality. Green signals perfect compatibility: you can use the basic flag in exactly the same manner as you previously have in PLINK 1.07/GCTA/etc. (Note that green does not guarantee the absence of additional options.) Yellow signals slightly different functionality and/or command-line usage, and blue signals that the flag is new to PLINK 1.9.
  • If parts of our current implementation are known or strongly suspected to be incomplete, that is signaled with red text. So red text on a green background indicates that we plan to provide perfect compatibility, but we have more coding and/or testing to do before we get there.

If you're already familiar with PLINK, this should help you skim over stuff you already know. If there are just one or two flags you need to look up, you can quickly find what you need in the sidebar; try the search box if the correct page isn't immediately apparent.

For the newer bioinformaticians out there, here's our first full flag description.

Setting the output file prefix

--out <prefix>

By default, the output files generated by PLINK all have names of the form 'plink.<one of these extensions>'. This is fine for a single run, but as soon as you make more use of PLINK, you'll start causing results from previous runs to be overwritten.

Therefore, you usually want to choose a different output file prefix for each run. --out causes 'plink' to be replaced with the prefix you provide. E.g. in the example above, "--out toy_analysis" caused PLINK to create a file named toy_analysis.frq instead of plink.frq.

Since the prefix is a required parameter, invoking --out without it will cause PLINK to quit during command line parsing:

[chrchang:~/plink-ng]$ ./plink --file toy --freq --out
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Error: Missing --out parameter.
For more information, try "plink --help <flag name>" or "plink --help | more".

In the rest of this documentation, we will continue highlighting full command lines in purple, default parameter values in orange, and sample parameter values you can freely change in green.

Citation instructions

If you use PLINK 1.9 in any published work, please cite both the software (as an electronic resource/URL):

Package : PLINK [version]
Authors : Shaun Purcell, Christopher Chang
URL     : www.cog-genomics.org/plink/1.9/

and the manuscript(s) describing the methods you used. Our primary methods paper is:

Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4.

PLINK 1.9 includes implementations of many analyses that were developed by other teams. The original sources are summarized below.

Standard data input >>