Introduction, downloads

S: 18 Aug 2024 (b7.4)

D: 18 Aug 2024

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Discussion forums

plink2-users

Credits

File formats

Quick index search

Allelic scoring

--score <filename> [variant ID col.] [allele col.] [score col.] ['header'] [{sum | no-sum}] [{no-mean-imputation | center}] ['include-cnt'] ['double-dosage']

It's useful at times to apply a simple linear scoring system to all your genotypes; for example, this approach can be used to estimate genetic load, or apply additive effect estimates for a quantitative trait. The --score flag performs this function, writing results to plink.profile (unless --q-score-range is also present; see below).

The input file should have one line per scored variant. By default, the variant ID is read from column 1, an allele code is read from the following column, and the score associated with the named allele is read from the column after the allele column; you can change these positions by passing column numbers to --score. E.g.

--score my.scores 2 4

reads variant IDs from column 2, allele codes from column 4, and scores from column 5, while

--score my.scores 3 2 1

reads variant IDs from column 3, allele codes from column 2, and scores from column 1.

In addition,

  • The 'header' modifier causes the first nonempty line of the input file to be ignored; otherwise, --score assumes there is no header line.
  • By default, final scores are averages of valid per-allele scores. The 'sum' modifier causes sums to be reported instead. (This cannot be used with 'no-mean-imputation'. And for backward compatibility, 'sum' is automatically on with dosage data unless 'no-sum' is specified.)
  • By default, copies of the unnamed allele contribute zero to score, while missing genotypes contribute an amount proportional to the loaded (via --read-freq) or imputed allele frequency. To throw out missing observations instead (decreasing the denominator in the final average when this happens), use the 'no-mean-imputation' modifier.
  • Alternatively, you can use the 'center' modifier to shift all scores to mean zero. For example, if the minor allele is assigned score 4.5, and its loaded/imputed frequency is 0.2, 'center' makes minor allele observations contribute +3.6 instead of +4.5 to score, and major allele observations contribute -0.9 instead of 0.
  • This command can be used with dosage data. By default, the 'CNT' column is omitted from the output file in this case; use 'include-cnt' to keep it. Also, note that scores are multiplied by 0..1 dosages, not 0..2 diploid allele counts, unless the 'double-dosage' modifier is present.

--score can be used on --lasso's output as follows:

plink --bfile mydata --score plink.lasso 2 header sum

--q-score-range <range file> <data file> [variant ID col.] [data col.] ['header']

To apply --score to subset(s) of variants in the primary score list based on ranges of some key quantity (e.g. p-value), you can use --q-score-range. The first parameter should be the name of a file with range labels in the first column, lower bounds in the second column, and upper bounds in the third column, such as

S1  0.00 0.01
S2  0.00 0.20
S3  0.10 0.50

(Lines with too few entries or nonnumeric values in the second or third column are ignored, so it's generally safe to include a header line in this file.) This would cause three score reports to be generated: plink.S1.profile would only consider variants with key quantity values in [0, 0.01], plink.S2.profile would only consider [0, 0.2], and plink.S3.profile would only consider [0.1, 0.5].

The second file should contain a variant ID and the key quantity on each nonempty line (except possibly the first). By default, variant IDs are assumed to be in column 1 and the quantity in the following column; you can change these positions in the same way as with --score. The 'header' modifier causes the first nonempty line of the second file to be skipped.

R plugin functions >>