Introduction, downloads

D: 3 Dec 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PROVISIONAL_REF?

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--check-sex/--impute-sex

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Discussion forums

Credits

File formats

Tutorials

Setup

Rules of Thumb

Data Exploration 1 — HWE, Allele Frequency Spectrum

Data Exploration 2 — Genomic Structure

Linkage

Relationship Matrix

Genome-Wide Assocation Analyses (GWAS)

Regressions

Post-Hoc

Formatting Files

bcftools

Variant IDs

Reference Alleles

Format for R

Shortcuts

Quick index search

Pairwise diffs

Filesets

--pgen-diff <.pgen/.bed filename> <.pvar/.bim> <.psam/.fam>
            ['include-missing'] ['zs'] ['dosage' | 'dosage='<tolerance>]
            ['cols='<column set descriptor>]
--pgen-diff <.pgen + .pvar + .psam prefix> ['vzs'] ['include-missing'] ['zs']
            ['dosage' | 'dosage='<tolerance>] ['cols='<col set descriptor>]

--pgen-diff compares overlapping samples and variants between two filesets (after applying the usual sample and variant filters), and reports unphased genotype/dosage differences to plink2.pdiff.

  • If chrX or chrY is present, sex must be defined and consistent. Nonmales are not included in the comparison on chrY.
  • Variants are only compared if their IDs and positions match. An error is reported if any such match is not unique.
  • The 'vzs' modifier works as with --pfile.
  • By default, comparisons are based on genotype hardcalls. Use the 'dosage' modifier to compare dosages instead; you can combine this with a tolerance in [0, 0.5).
  • By default, if one genotype is missing and the other isn't, that doesn't count as a difference; this can be changed with the 'include-missing' modifier.
  • Refer to the file format entry for output details and optional columns.
Samples

--sample-diff ['id-delim='<char>] ['dosage' | 'dosage='<tolerance>]
              ['include-missing'] [{pairwise | counts-only}] ['zs']
              ['fname-id-delim='<c>] ['cols='<column set descriptor>]
              ['counts-cols='<column set descriptor>]
              {base= | ids=}<sample ID> <other sample ID(s)...>
--sample-diff ['id-delim='<char>] ['dosage' | 'dosage='<tolerance>]
              ['include-missing'] [{pairwise | counts-only}] ['zs']
              ['fname-id-delim='<c>] ['cols='<column set descriptor>]
              ['counts-cols='<column set descriptor>]
              file=<ID-pair file>
  (alias: --sdiff)

--sample-diff reports discordances and discordance-counts between pairs of samples. If chrX or chrY is present, sex must be defined and consistent.

  • There are three ways to specify which sample pairs to compare.
    • To compare a single baseline sample against some others, start the (space-delimited) sample ID list with 'base='.
    • To perform an all-vs.-all comparison between the samples you name, start it with 'ids=' instead.
    • To compare sample pairs listed in a file (one pair per line), use 'file='.
    Note that 'base='/'ids='/'file=' must be positioned after all modifiers.
  • Sample IDs are interpreted as if they were in a VCF header line, with 'id-delim=' having the usual effect.
  • By default, comparisons are based on hardcalls. Use 'dosage' to compare dosages instead; you can combine this with a tolerance in [0, 0.5).
  • By default, if one genotype is missing and the other isn't, that doesn't count as a difference; this can be changed with 'include-missing'.
  • By default, a single main report is written to plink2[.<base ID>].sdiff[.zst], and a discordance-count summary is written to plink2.sdiff.summary.
    • To write separate pairwise plink2.<ID1>.<ID2>.sdiff[.zst] report files for each compared ID pair, add the 'pairwise' modifier.
    • To omit the main report, add the 'counts-only' modifier. (Note that, if you're only interested in nonmissing autosomal biallelic hardcalls, --make-king-table provides a more efficient way to compute just counts.)
  • By default, if an output filename has a multipart sample ID, the parts will be delimited by '_'; use 'fname-id-delim=' to change it.
  • Refer to the file format entries for other output details and optional columns.

Linkage disequilibrium >>