Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

(--distance...)

Relationship/covariance

(--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

(--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

(--assoc, --model)

Stratified case/control

(--mh, --mh2, --homog)

Quantitative trait

(--assoc, --gxe)

Regression w/ covariates

(--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Copy number analysis

Rare copy number variants

The following commands operate on .cnv + .fam filesets. (Most development has been postponed until PLINK 2.0's merge command is available for testing; many functions are incomplete.)

Filtering

--cnv-del
--cnv-dup

--cnv-kb <minimum size in kb>
--cnv-max-kb <maximum size in kb>

--cnv-score <minimum score>
--cnv-max-score <maximum score>
--cnv-sites <minimum sites>
--cnv-max-sites <maximum sites>

--cnv-del filters out all CNVs with more than one copy, while --cnv-dup filters out all CNVs with fewer than three copies.
--cnv-kb excludes all segments shorter than the given length; --cnv-max-kb filters out long segments. (To remove a minor internal inconsistency concerning how segment length is defined, PLINK 1.9 "--cnv-kb x.001" is equivalent to PLINK 1.07 "--cnv-kb x".)
--cnv-score excludes all CNVs with confidence score below the given threshold; --cnv-max-score filters out high scores.
--cnv-sites excludes all segments with fewer than the given number of probes, while --cnv-max-sites filters out high probe counts.

--cnv-intersect <region file>
--cnv-exclude <region file>
--cnv-subset <region name file>

--cnv-intersect causes only segments which overlap at least one of the regions in the given file to be included in the analysis, while --cnv-exclude excludes all segments which overlap a region.

Each line of the region file is expected to have the following 3-4 fields in front:

Chromosome code
Start of range (base-pair units)
End of range
Region identifier (only needed with --cnv-subset)

Given a file with one region name per line, --cnv-subset causes only regions named in that file to be loaded with --cnv-intersect/-exclude.

--cnv-overlap <x>

--cnv-region-overlap <x>
--cnv-union-overlap <x>
--cnv-disrupt

These flags modify the behavior of --cnv-{intersect,exclude}.

Given a segment of length n, --cnv-overlap redefines 'intersection' to require a minimum of xn base pairs. --cnv-region-overlap instead requires a minimum of xr base pairs, where r is the region's length, while --cnv-union-overlap has the most stringent requirement: x ≥ <intersection length> / <union length>.

--cnv-disrupt causes only segments with an endpoint in a region to be included/excluded.

--cnv-freq-exclude-above <k>
--cnv-freq-exclude-below <k>
--cnv-freq-exclude-exact <k>
--cnv-freq-include-exact <k>

--cnv-freq-overlap [x]

--cnv-freq-method2 [x]

--cnv-freq-exclude-above excludes all segments where any portion is included in more than k total segments. Similarly, --cnv-freq-exclude-below excludes segments where no portion is included in k or more total segments; --cnv-freq-exclude-exact excludes segments for which there is a portion which is included in at least k total segments but no portion is included in more; and --cnv-freq-include-exact is the reverse of --cnv-freq-exclude-exact.

These can be combined with --cnv-freq-overlap, which forces each 'portion' to be at least xn base pairs, where n is the segment length. Alternatively, --cnv-freq-method2 causes k to instead be compared against the number of segments (not excluding the original segment) where x ≥ <intersection length> / <union length>. For both --cnv-freq-overlap and --cnv-freq-method2, if x is zero or not given, it's treated as an infinitesimal positive value.

Refer to the PLINK 1.07 documentation for more discussion of these segment filtering flags.

--cnv-make-map ['short']

--cnv-exclude-off-by-1

--cnv-make-map generates the .cnv.map file needed by all other PLINK CNV analysis commands to proceed. It now automatically runs when needed, so it is unnecessary to explicitly invoke this (though you may still want to for performance reasons).

When automatically triggered, the new .cnv.map file is generated before any segmental filters are applied, to avoid nasty surprises when different filters are being applied to different analysis steps. It is put in the same directory and assigned the same prefix as the .cnv file, except when that might destroy existing data.

In contrast, when --cnv-make-map is explicitly invoked, segmental filters are applied first, and the filename is determined by the global output prefix instead of the input .cnv filename.

If there is a segment starting at bp x and ending at bp y, the resulting .cnv.map will have entries at positions x and (y+1). If the .cnv.map file was created via an explicit --cnv-make-map invocation without the 'short' modifier, it will also have an entry at position y (this is needed by PLINK 1.07, but not PLINK 1.9).

For compatibility with .cnv.map files generated by third-party scripts according to the PLINK 1.07 documentation, if there is a .cnv.map entry at position y but not (y+1), .cnv segments (of length > 1) ending at position y are normally still loaded, and treated as if they had ended at position (y-1) instead. Since this can lead to slightly inaccurate results, a warning will be printed when this happens. To exclude such segments, use the --cnv-exclude-off-by-1 flag.

--cnv-write ['freq']

This command causes a new .cnv fileset to be generated, with all requested filters applied. With the --cnv-freq-method2 flag and the 'freq' modifier, an extra 'FREQ' field is written to the .cnv with each overlap parameter.

Analysis

--cnv-check-no-overlap

This checks for within-sample CNV overlaps (which shouldn't happen). If any are present, they are reported in plink.cnv.overlap.

--cnv-indiv-perm <permutation count>

...

--cnv-test [{1sided | 2sided}] <permutation count>
--cnv-test-region <permutation count>

--cnv-test-window <window size, in kb>

...

--cnv-enrichment-test [permutation count]

--cnv-count <region file>

...

Common copy number polymorphisms

These commands operate on .gvar + .fam + .map filesets.

(No commands yet.)

Distributed computation >>