Introduction, downloads

D: 28 Oct 2018

Recent version history

What's new?

Coming next

General usage

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-{b}pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

General usage

Getting started

First, if plink and/or plink2 are not installed on your system, download and unzip the appropriate binaries (v1.9, v2.0). (Or clone from GitHub and recompile.) As alpha and beta testing continue, plink2 will become increasingly usable on its own, but for now it's better to think of it as a supplement to rather than a replacement for v1.9.

Then you can verify that both programs are functional with the following pair of commands:

./plink --dummy 2 2 --freq --make-bed --out toy_data

./plink2 --bfile toy_data --freq --out test2

You should see something like:

PLINK v1.90b6.4 64-bit (7 Aug 2018)            www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to toy_data.log.
Options in effect:
  --dummy 2 2
  --freq
  --make-bed
  --out toy_data

16384 MB RAM detected; reserving 8192 MB for main workspace.
Dummy data (2 people, 2 SNPs) written to toy_data-temporary.bed +
toy_data-temporary.bim + toy_data-temporary.fam .
2 variants loaded from .bim file.
2 people (0 males, 2 females) loaded from .fam.
2 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 2 founders and 0 nonfounders present.
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to toy_data.frq .
2 variants and 2 people pass filters and QC.
Among remaining phenotypes, 1 is a case and 1 is a control.
--make-bed to toy_data.bed + toy_data.bim + toy_data.fam ... done.

PLINK v2.00a2 AVX2 (21 Aug 2018)               www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --bfile toy_data
  --freq
  --out test2

Start time: Tue Aug 21 21:38:28 2018
16384 MiB RAM detected; reserving 8192 MiB for main workspace.
Using up to 8 compute threads.
2 samples (2 females, 0 males; 2 founders) loaded from toy_data.fam.
2 variants loaded from toy_data.bim.
1 binary phenotype loaded (1 case, 1 control).
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to test2.afreq .
End time: Tue Aug 21 21:38:28 2018

(Remove the './' prefix if the program was installed earlier, or if you've added it to the system PATH.) If either command fails, verify that you downloaded the correct binaries for your machine, and consult the plink2-users Google group if you're still stuck.

Okay, what did these commands mean? And what just happened?

PLINK parses each command line as a collection of flags (each of which starts with two dashes1), plus parameters (which immediately follow a flag, and never start with a dash unless that dash is immediately followed by a digit) for those flags. The first command included four flags: --dummy, --freq, --make-bed, and --out. They specify the following three things, which are part of almost every PLINK run:

  • Input data: '--dummy 2 2' tells PLINK 1.9 to generate a new random dataset with 2 samples and 2 variants. You'll see several other ways to specify input data on the next page.
  • Operation(s) to perform: --freq tells PLINK to generate an allele frequency report, and --make-bed tells PLINK to save the data in PLINK 1 binary format. The full range of supported operations is summarized under 'Main functions' in the sidebar, and the formats of all reports are described in the file format appendices (v1.9, v2.0).
  • An output file prefix: We'll elaborate on this in a moment.

So this particular combination makes PLINK 1.9 generate a new 2x2 dataset, write an allele frequency report to toy_data.frq, and save the dataset to toy_data.bed + .bim + .fam. Similarly, the second command makes PLINK 2.0 write its own allele frequency report to plink2.afreq.

1: Actually, that was a lie. With the exceptions of --1 and --23file, PLINK 1.9 and 2.0 allow you to use a single dash in front of each flag. In exchange for saving you some keystrokes, please do yourself a favor and avoid filenames that begin with a dash.

The allele frequency reports are different?...

You may have noticed that the file extensions of the v1.9 and v2.0 allele frequency reports aren't the same, and there are several formatting differences between the two files, though they clearly contain the same information. This is true for many commands; PLINK 2.0 cannot generally be used as a drop-in replacement for previous PLINK versions. We realize this can be a major annoyance, and will continue maintaining v1.9 for a long time to come for those who need full backward compatibility. However, v2.0's reports are better-standardized (header lines preceded by '#', tab-delimited, column headers are consistent with VCF, etc.) and more flexible (lots of optional column sets); hopefully, this'll make your life easier and be worth some minor transitional headaches.

Interpreting our flag usage summaries

The rest of this documentation has many one-line summaries describing the parameter sets accepted by particular flags, followed by discussions of flag functionality and the effects of optional parameters. We use the following conventions in our one-line usage summaries:

  • [square brackets] denote a required parameter, where the text between the brackets describes its nature.
  • <angle brackets> denote an optional modifier (or if '|' is present, a set of mutually exclusive optional modifiers). To invoke one, you need to use the EXACT text given in our summary, e.g. '--freq counts' is valid given the summary

--freq <counts> ...

  • There's one exception to the angle brackets/exact text rule: when a modifier name in angle brackets ends with '=[value]', '[value]' designates a variable parameter. E.g. '--glm perm' and '--glm mperm=10000' are both valid given the summary

--glm <perm | mperm=[value]> ...

  • {curly braces} denote an optional parameter, where the text between the braces describes its nature.
  • An ellipsis (...) indicates that you can enter many parameters of the specified type.
  • Many PLINK 2.0 commands accept a "column set descriptor". For example, the help text for --make-king-table is

    --make-king-table <zs> <counts> <cols=[column set descriptor]>
      Similar to --make-king, except results are reported in the original .kin0
      text table format (with minor changes, e.g. row order is more friendly to
      incremental addition of samples), and --king-table-filter can be used to
      restrict the report to high kinship values.
      Supported column sets are:
        maybefid: FID1/FID2, if that column was in the input.   Requires 'id'.
        id: IID1/IID2 (column headers are actually 'ID1'/'ID2' to match KING).
        maybesid: SID1/SID2, if that column was in the input. Requires 'id'.
        sid: Force SID1/SID2 even when SID was absent in the input.
        nsnp: Number of variants considered (autosomal, neither call missing).
        hethet: Proportion/count of considered call pairs which are het-het.
        ibs0: Proportion/count of considered call pairs which are opposite homs.
        ibs1: HET1_HOM2 and HET2_HOM1 proportions/counts.
        kinship: KING-robust between-family kinship estimator.
      The default is maybefid,id,maybesid,nsnp,hethet,ibs0,kinship.
      hethet/ibs0/ibs1 values are proportions unless the 'counts' modifier is
      present.  If id is omitted, a .kin0.id file is also written.

    A valid descriptor is either
    • a comma-separated sequence of column set names (e.g. 'cols=maybefid,id,nsnp,hethet,ibs0,ibs1,kinship' would add HET1_HOM2 and HET2_HOM1 columns, while ensuring that SID columns do not appear), or
    • a comma-separated sequence of column set names where every name is preceded by a plus or minus (in which case the column sets are added/subtracted from the default, e.g. 'cols=+ibs1,-maybesid' is a shorter way to add HET1_HOM2/HET2_HOM1 and exclude SID1/SID2).
  • Background color summarizes degree of similarity to PLINK 1.9. Green signals maximal compatibility: there will usually be a minor difference in output file formats, but all information in the PLINK 1.9 output file will also be present in the PLINK 2.0 output file when the same flag and modifiers are used. (Note that green does not guarantee the absence of additional options.) Yellow signals slightly different functionality and/or command-line usage, and blue signals that the flag is new to PLINK 2.0.
  • If parts of our current implementation are known or strongly suspected to be incomplete, that is signaled with red text. So red text on a green background indicates that we plan to provide perfect compatibility, but we have more coding and/or testing to do before we get there.

If you're already familiar with PLINK, this should help you skim over stuff you already know. If there are just one or two flags you need to look up, you can quickly find what you need in the sidebar; try the search box if the correct page isn't immediately apparent.

For the newer bioinformaticians out there, here's our first full flag description.

Setting the output file prefix

--out [prefix]

By default, the output files generated by PLINK 2.0 all have names of the form 'plink2.xyz', where '.xyz' is one of these extensions. This is fine for a single run, but as soon as you make more use of PLINK, you'll start causing results from previous runs to be overwritten.

Therefore, you usually want to choose a different output file prefix for each run. --out causes 'plink2' to be replaced with the prefix you provide. E.g. in the example above, '--out test2' caused PLINK 2 to create a file named test2.afreq instead of plink2.afreq.

Since the prefix is a required parameter, invoking --out without it will cause PLINK 2 to quit during command line parsing:

[chrchang:~/plink-ng]$./plink2 --bfile toy_data --freq --out
PLINK v2.00a2 AVX2 (21 Aug 2018)               www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Error: Missing --out parameter.
For more info, try 'plink2 --help [flag name]' or 'plink2 --help | more'.

In the rest of this documentation, we will continue highlighting full command lines in purple, default parameter values in orange, and sample parameter values you can freely change in green.

Citation instructions

If you use PLINK 2.0 in any published work, please cite both the software (as an electronic resource/URL):

Package : PLINK [version]
Authors : Shaun Purcell, Christopher Chang
URL     : www.cog-genomics.org/plink/2.0/

and the manuscript(s) describing the methods you used. Our primary methods paper is:

Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4.

PLINK 2.0 includes implementations of many analyses that were developed by other teams. The original sources are summarized below.

Standard data input >>