Introduction, downloads

D: 18 Mar 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Command-line help

--help [flag name/prefix...]

When invoked with no parameters, --help provides a summary of all PLINK flags, starting with the main functions. This is long (over 1500 lines); we recommend you pipe the output through a terminal pager like Unix less or more, or dump it to a file with e.g.

plink2 --help > plink2-help.txt

Alternatively, you can provide one or more flag names/prefixes to cause PLINK to only display information on the referenced flags, e.g.

[chrchang:~/plink-ng]$ plink2 --help abcd gneome
PLINK v2.00a4 AVX2 (1 Jan 2023)                www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3

--make-king-table ['zs'] ['counts'] ['rel-check'] ['cols='<col set descrip.>]
  Similar to --make-king, except results are reported in KING's original
  .kin0 text table format (with minor changes, e.g. row order is more
  friendly to incremental addition of samples), --king-table-filter can be
  used to restrict the report to high kinship values, and the 'rel-check'
  modifier can be used to restrict to same-FID pairs.
  Supported column sets are:
    maybefid: FID1/FID2, if that column was in the input.  Requires 'id'.
    fid: Force FID1/FID2 even when FID was absent in the input.
    id: IID1/IID2.
    maybesid: SID1/SID2, if that column was in the input.  Requires 'id'.
    sid: Force SID1/SID2 even when SID was absent in the input.
    nsnp: Number of variants considered (autosomal, neither call missing).
    hethet: Proportion/count of considered call pairs which are het-het.
    ibs0: Proportion/count of considered call pairs which are opposite homs.
    ibs1: HET1_HOM2 and HET2_HOM1 proportions/counts.
    kinship: KING-robust between-family kinship estimator.
  The default is maybefid,id,maybesid,nsnp,hethet,ibs0,kinship.
  hethet/ibs0/ibs1 values are proportions unless the 'counts' modifier is
  present.  If id is omitted, a .kin0.id file is also written.

No help entry for 'abcd'.

More precisely, for each parameter you pass to --help, PLINK will first search for an exact flag name or keyword match; if it fails to find one, it will then search for exact prefix matches; and if it also fails to find any of those, it will search for Damerau-Levenshtein distance 1 matches (note the 'gneome' misspelling above; 'genome' is a keyword for --make-king-table, since --make-king-table includes much of the functionality of PLINK 1.x's --genome command).

If --help is used with other flags (other than --script and --rerun), it causes everything before it on the command line to be ignored, and everything after it to be treated as --help parameters. This is convenient when you've forgotten exactly how a flag works while in the middle of typing a long command: you can put your help request at the end of the unfinished command, and then retrieve your unfinished command line with the up arrow (in most shells, anyway).

[chrchang:~/plink-ng]$ plink2 --pfile test_data --hwe 1e-10 midp keep-fewhet --help --pca
PLINK v2.00a4 AVX2 (1 Jan 2023)                www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
--help present, ignoring other flags.

--pca [count] [{approx | meanimpute}] ['scols='<col set descriptor>]
--pca [{allele-wts | biallelic-var-wts}] [count] [{approx | meanimpute}]
      ['vzs'] ['scols='<col set descriptor>] ['vcols='<col set descriptor>]
  Extracts top principal components from the variance-standardized
  relationship matrix.
  * It is usually best to perform this calculation on a variant set in
    approximate linkage equilibrium, with no very-low-MAF variants.
  * By default, 10 PCs are extracted; you can adjust this by passing a
    numeric parameter.  (Note that 10 is lower than the PLINK 1.9 default of
    20; this is due to the randomized algorithm's memory footprint growing
    quadratically w.r.t. the PC count.)
  * The 'approx' modifier causes the standard deterministic computation to be
    replaced with the randomized algorithm originally implemented for
    Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ,
    Price AL (2016) Fast Principal-Component Analysis Reveals Convergent
    Evolution of ADH1B in Europe and East Asia.  This can be a good idea when
    you have >5k samples, and is almost required with >50k.
  * The randomized algorithm always uses mean imputation for missing genotype
    calls.  For comparison purposes, you can use the 'meanimpute' modifier to
    request this behavior for the standard computation.
  * 'scols=' can be used to customize how sample IDs appear in the .eigenvec
    file.  (maybefid, fid, maybesid, and sid supported; default is
    maybefid,maybesid.)
  * The 'allele-wts' modifier requests an additional one-line-per-allele
    .eigenvec.allele file with PCs expressed as allele weights instead of
    sample weights.  When it's present, 'vzs' causes the .eigenvec.allele
    file to be Zstd-compressed.
    'vcols=' can be used to customize the report columns; supported column
    sets are:
      chrom: Chromosome ID.
      pos: Base-pair coordinate.
      (ID is always present, and positioned here.)
      ref: Reference allele.
      alt1: Alternate allele 1.
      alt: All alternate alleles, comma-separated.
      (A1 is always present, and positioned here.)
      ax: Non-A1 alleles, comma-separated.
      (PCs are always present, and positioned here.)
    Default is chrom,ref,alt.
  * For datasets with no multiallelic variants, the 'biallelic-var-wts'
    modifier requests the old .eigenvec.var format, which only reports
    weights for major alleles.  (These weights are 2x the corresponding
    .eigenvec.allele weights.)  Supported column sets are:
      chrom: Chromosome ID.
      pos: Base-pair coordinate.
      (ID is always present, and positioned here.)
      ref: Reference allele.
      alt1: Alternate allele 1.
      alt: All alternate alleles, comma-separated.
      maj: Major allele.
      nonmaj: All nonmajor alleles, comma-separated.
      (PCs are always present, and positioned here. Signs are w.r.t. the
      major, not necessarily reference, allele.)
    Default is chrom,maj,nonmaj.
[chrchang:~/plink-ng]$ plink2 --pfile test_data --hwe 1e-10 midp keep-fewhet --pca ...

Miscellany >>