Introduction, downloads

D: 3 Mar 2023

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--adjust-file

Linear scoring

--score

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Pairwise diffs

Filesets

--pgen-diff <.pgen/.bed filename> <.pvar/.bim> <.psam/.fam>
            ['include-missing'] ['zs'] ['dosage' | 'dosage='<tolerance>]
            ['cols='<column set descriptor>]
--pgen-diff <.pgen + .pvar + .psam prefix> ['vzs'] ['include-missing'] ['zs']
            ['dosage' | 'dosage='<tolerance>] ['cols='<col set descriptor>]

--pgen-diff compares overlapping samples and variants between two filesets (after applying the usual sample and variant filters), and reports unphased genotype/dosage differences to plink2.pdiff.

  • If chrX or chrY is present, sex must be defined and consistent. Nonmales are not included in the comparison on chrY.
  • Variants are only compared if their IDs and positions match. An error is reported if any such match is not unique.
  • The 'vzs' modifier works as with --pfile.
  • By default, comparisons are based on genotype hardcalls. Use the 'dosage' modifier to compare dosages instead; you can combine this with a tolerance in [0, 0.5).
  • By default, if one genotype is missing and the other isn't, that doesn't count as a difference; this can be changed with the 'include-missing' modifier.
  • Refer to the file format entry for output details and optional columns.
Samples

--sample-diff ['id-delim='<char>] ['dosage' | 'dosage='<tolerance>]
              ['include-missing'] [{pairwise | counts-only}] ['zs']
              ['fname-id-delim='<c>] ['cols='<column set descriptor>]
              ['counts-cols='<column set descriptor>]
              {base= | ids=}<sample ID> <other sample ID(s)...>
--sample-diff ['id-delim='<char>] ['dosage' | 'dosage='<tolerance>]
              ['include-missing'] [{pairwise | counts-only}] ['zs']
              ['fname-id-delim='<c>] ['cols='<column set descriptor>]
              ['counts-cols='<column set descriptor>]
              file=<ID-pair file>
  (alias: --sdiff)

--sample-diff reports discordances and discordance-counts between pairs of samples. If chrX or chrY is present, sex must be defined and consistent.

  • There are three ways to specify which sample pairs to compare.
    • To compare a single baseline sample against some others, start the (space-delimited) sample ID list with 'base='.
    • To perform an all-vs.-all comparison between the samples you name, start it with 'ids=' instead.
    • To compare sample pairs listed in a file (one pair per line), use 'file='.
    Note that 'base='/'ids='/'file=' must be positioned after all modifiers.
  • Sample IDs are interpreted as if they were in a VCF header line, with 'id-delim=' having the usual effect.
  • By default, comparisons are based on hardcalls. Use 'dosage' to compare dosages instead; you can combine this with a tolerance in [0, 0.5).
  • By default, if one genotype is missing and the other isn't, that doesn't count as a difference; this can be changed with 'include-missing'.
  • By default, a single main report is written to plink2[.<base ID>].sdiff[.zst], and a discordance-count summary is written to plink2.sdiff.summary.
    • To write separate pairwise plink2.<ID1>.<ID2>.sdiff[.zst] report files for each compared ID pair, add the 'pairwise' modifier.
    • To omit the main report, add the 'counts-only' modifier. (Note that, if you're only interested in nonmissing autosomal biallelic hardcalls, --make-king-table provides a more efficient way to compute just counts.)
  • By default, if an output filename has a multipart sample ID, the parts will be delimited by '_'; use 'fname-id-delim=' to change it.
  • Refer to the file format entries for other output details and optional columns.

Linkage disequilibrium >>