Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Resources

Genotype data

See the PLINK 2 Resources page for 1000 Genomes phase 3. PLINK 2 --make-bed can be used to convert those files to PLINK 1 binary format.

If you really want just phase 1, click here.

1000 Genomes phase 1 (hosted by GigaDB, Aspera download available there)

Refer to the 1000 Genomes website for additional sample information, data usage rules, and citation instructions.

HapMap phase 2

See the PLINK 1.07 resources page.

Teaching materials and example dataset

These files were created by Shaun Purcell for PLINK 1.02 (+ gPLINK + Haploview), but everything except for the haplotypic analysis will still work with 1.90.

  • Tutorial data: example.zip (BWH mirror), which contains the following six files:
    • wgas1.ped (sample whole-genome .ped data file)
    • wgas1.map (corresponding .map file)
    • extra.ped (sample follow-up regional genotyping .ped file)
    • extra.map (corresponding .map file)
    • pop.cov (population membership variable)
    • command-list.txt (command list for 2nd part of practical)
    • The BWH mirror file also contains an old Windows plink.exe, and gPLINK/Haploview .jar files.
  • Teaching materials: teaching.zip (BWH mirror), which contains the following two files:
    • practical-1-slides.ppt
    • practical-2-notes.doc

Everything should be fairly self-explanatory after looking through the PowerPoint file and Word document.

Gene range lists

These lists are valid input for flags such as --make-set, "--extract range", "--annotate ranges", and --gene-report.

They contain one gene per row, with the following four columns:

  1. Chromosome code
  2. Start of gene (base-pair units, 1-based)
  3. End of gene (this position is included in the interval)
  4. Gene ID

Our files were generated from UCSC Table Browser RefSeq track data in May 2014 with the following pipeline:

tail -n +2 ucscdl-hgxx | awk '{print $3 " " $5 " " $6 " " $13}' | cut -c 4- | grep -E '^.{1,2}\ ' | awk '{print $4 " " $1 " " $2 " " $3}' | nsort | interval_merge > glist-hgxx

where

  • nsort is a variant of the Unix sort utility which implements "natural sort"; and
  • interval_merge merges overlapping intervals associated with the same gene ID, inserts XY pseudoautosomal region entries when appropriate, and reorders the fields.

(Source code for both of these auxiliary programs is in the GitHub repository.)

Functional SNP attributes

This file contains nonsense, missense, frameshift, and splice annotations from dbSNP build 129, and is designed to be used with the --annotate and --attrib flags.

SNP attributes (dbSNP build 129): snp129.attrib.gz (BWH mirror)

We plan to assemble an updated version of this file; let us know if there's anything you want us to add, or have thoughts re: filtering out probable low-quality dbSNP entries.

Errors and warnings >>