Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

R plugin functions

--R <R script filename> ['debug']

(Not supported on Windows.)

PLINK is designed to interoperate well with R: almost all built-in commands generate tabular reports that are easy to load and postprocess in it. With the Rserve package (preferably version 1.7 or later) and PLINK's --R flag, you can also apply R functions directly to PLINK binary data, without the need to write your own I/O code.

--R loads the given R script, which must have a function of the form

Rplink <- function(PHENO,GENO,CLUSTER,COVAR)

where

  • PHENO is a vector of phenotypes (length N)
  • GENO is a matrix of genotypes (N rows, m columns; 0/1/2/'NA' additive coding, like "--recode A")
  • CLUSTER is a vector of numeric cluster IDs (length N, all-zero when no clusters are defined), and
  • COVAR is a matrix of covariates (N rows, C columns).

(N is the number of samples with nonmissing phenotype values (after filtering); C, which can be zero, is the number of covariates; and m is the number of variants in the current data block, which is usually smaller than the total number in the dataset.)

For each variant, PLINK expects this function to return a numeric vector of values of the form

c(length(r), r)

where the vectors are permitted to have different lengths. The PLINK 1.07 documentation contains several detailed examples. If this basic interface is insufficient for your needs, you may find the PLINK/SEQ R package to be more helpful.

On a normal --R run, results are written to plink.auto.R. If you want to look at the R commands PLINK sends, add the 'debug' modifier; this causes them to be logged to plink.debug.R (without being executed).

Connecting elsewhere

--R-port <port number>

--R-host <host>
--R-socket <socket>

By default, --R tries to connect to a local Rserve instance on port 6311. You can change this as follows:

  • --R-port sets the port number.
  • --R-host lets you connect to a remote host, while --R-socket specifies a socket name.

Secondary input >>