Introduction, downloads

D: 18 Apr 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Distributed computation

--parallel <1-based current job index> <total job pieces>

--parallel causes PLINK to complete only one part of a job; the job index is appended to the main output filename. (If the main output file is Zstd-compressed, the file extension will instead be of the form <usual extension before .zst>.<1-based index>.zst.)

Use Unix cat on the resulting files to assemble the full computation result. (For compressed files, it is safe to do this either before or after decompression.) For example:

[chrchang:~/plink-ng]$ plink2 --pfile test_data --make-king triangle bin4 --parallel 1 2 --out split
PLINK v2.00a4 AVX2 (1 Jan 2023)                www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to split.log.
Options in effect:
  --make-king triangle bin4
  --out split
  --parallel 1 2
  --pfile test_data

Start time: Sun Jan  1 22:14:44 2023
16384 MiB RAM detected; reserving 8192 MiB for main workspace.
Using up to 8 compute threads.
2504 samples (1271 females, 1233 males; 2497 founders) loaded from
test_data.psam.
1105538 variants loaded from test_data.pvar.
2 categorical phenotypes loaded.
--make-king pass 1/1: Scanning for rare variants... done.
940487 variants handled by initial scan (165051 remaining).
--make-king pass 1/1: Writing... done.
--make-king: 1105538 variants processed.
Results written to split.king.bin.1 and split.king.id .
End time: Sun Jan  1 22:14:46 2023
[chrchang:~/plink-ng]$ plink2 --pfile test_data --make-king triangle bin4 --parallel 2 2 --out split
PLINK v2.00a4 AVX2 (1 Jan 2023)                www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to split.log.
Options in effect:
  --make-king triangle bin4
  --out split
  --parallel 2 2
  --pfile test_data

Start time: Sun Jan  1 22:15:40 2023
16384 MiB RAM detected; reserving 8192 MiB for main workspace.
Using up to 8 compute threads.
2504 samples (1271 females, 1233 males; 2497 founders) loaded from
test_data.psam.
1105538 variants loaded from test_data.pvar.
2 categorical phenotypes loaded.
--make-king pass 1/1: Scanning for rare variants... done.
937450 variants handled by initial scan (168088 remaining).
--make-king pass 1/1: Writing... done.
--make-king: 1105538 variants processed.
Results written to split.king.bin.2 and split.king.id .
End time: Sun Jan  1 22:15:42 2023
[chrchang:~/plink-ng]$ cat split.king.bin.1 split.king.bin.2 > split.king.bin
[chrchang:~/plink-ng]$ plink2 --pfile test_data --king-cutoff split 0.0883883 --out final
PLINK v2.00a4 AVX2 (1 Jan 2023)                www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to final.log.
Options in effect:
  --king-cutoff split 0.0883883
  --out final
  --pfile test_data

Start time: Sun Jan  1 22:17:46 2023
16384 MiB RAM detected; reserving 8192 MiB for main workspace.
Using up to 8 compute threads.
2504 samples (1271 females, 1233 males; 2497 founders) loaded from
test_data.psam.
2 categorical phenotypes loaded.
--king-cutoff: 556 constraints loaded.
--king-cutoff: Excluded sample IDs written to final.king.cutoff.out.id , and
2295 remaining sample IDs written to final.king.cutoff.in.id .
End time: Sun Jan  1 22:17:46 2023

This sequence of commands writes the first half of the (triangular binary, single-precision) kinship matrix to split.king.bin.1, the second half to split.king.bin.2, assembles the full triangular binary kinship matrix with cat, and then prunes close relations with it (--king-cutoff).

Currently, the --make-rel, --make-grm-list/--make-grm-bin, and --make-king[-table] commands directly support distributed computation. For most matrix computations, either the 'square0' or 'triangle' output shape must be used.

Command-line help >>