Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Distributed computation

--parallel <1-based current job index> <total job pieces>

--parallel causes PLINK to complete only one part of a job; the job index is appended to the main output filename. (If the main output file is gzipped, the file extension will instead be of the form <usual extension before .gz>.<1-based index>.gz.)

Use Unix cat on the resulting files to assemble the full computation result. (For gzipped files, it is safe to do this either before or after decompression.) For example:

[chrchang:~/plink-ng]$ plink --bfile test_data --distance triangle bin --parallel 1 2 --out result
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to result.log.
Options in effect:
  --bfile test_data
  --distance triangle bin
  --out result
  --parallel 1 2

4096 MB RAM detected; reserving 2048 MB for main workspace.
100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
Distance matrix calculation complete.
IDs written to result.dist.id .
Distances (allele counts) written to result.dist.bin.1 .
[chrchang:~/plink-ng]$ plink --bfile test_data --distance triangle bin --parallel 2 2 --out result
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to result.log.
Options in effect:
  --bfile test_data
  --distance triangle bin
  --out result
  --parallel 2 2

100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
Distance matrix calculation complete.
Distances (allele counts) written to result.dist.bin.2 .
[chrchang:~/plink-ng]$ cat result.dist.bin.1 result.dist.bin.2 > result.dist.bin
[chrchang:~/plink-ng]$ plink --bfile test_data --read-dists result.dist.bin --regress-distance
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --bfile test_data
  --read-dists result.dist.bin
  --regress-distance

4096 MB RAM detected; reserving 2048 MB for main workspace.
100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
--read-dists: 499500 values loaded.
Phenotype stdev: 1.01927
Regression slope (y = genomic distance, x = avg phenotype): -17.9796
Regression slope (y = avg phenotype, x = genomic distance): -4.39263e-05
Setting d=63 for jackknife.
Jackknife s.e.: 7.03741
Jackknife s.e. (y = avg phenotype): 1.72336e-05

This sequence of commands writes the first half of the (triangular binary) distance matrix to result.dist.bin.1, the second half to result.dist.bin.2, assembles the full triangular binary matrix file with cat, and then loads the full matrix for analysis with --regress-distance.

Currently, the --r/--r2, --distance, --genome, --make-rel, --make-grm-gz/--make-grm-bin, --epistasis, and --fast-epistasis flags directly support distributed computation. For most matrix computations, either the 'square0' or 'triangle' output shape must be used.

--write-var-ranges <block ct>

Many simpler jobs can be distributed by providing an appropriate range to --snps on each machine. To facilitate this, --write-var-ranges divides the set of variants into equal-size blocks, and writes block boundaries to plink.var.ranges. (Sizes will vary by 1 if the total variant count is not divisible by the requested block count.)

Command-line help >>