Introduction, downloads

S: 11 Dec 2023 (b7.2)

D: 11 Dec 2023

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

  (--distance...)

Relationship/covariance

  (--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

  (--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

  (--assoc, --model)

Stratified case/control

  (--mh, --mh2, --homog)

Quantitative trait

  (--assoc, --gxe)

Regression w/ covariates

  (--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Google groups

plink2-users

plink2-dev

Credits

File formats

Quick index search

Miscellany

Tabs vs. spaces

By default, old flags usually produce space-delimited output with an attempt at equal column widths1, while new flags produce tab-delimited output. When a report is not formatted the way you want, the Unix tr command and our prettify utility may come in handy. (Some systems also have a column utility which is similar to prettify.)

tr, with no flags, replaces all instances of one character with another character on a one-for-one basis. So,

cat plink.dist | tr '\t' ' ' > plink.dist.spaces

makes plink.dist.spaces a copy of plink.dist with all tabs converted to spaces, and

cat plink.dist.spaces | tr ' ' '\t' > plink.dist2

converts spaces to tabs instead.

When converting spaces to tabs, you'll frequently want to collapse strings of consecutive spaces down to single tabs; this is what tr's -s flag is for. E.g.

cat plink.genome | tr -s ' ' '\t' > plink.genome.tabs

(To also strip leading and trailing spaces, you can use something like the two-sed pipeline mentioned in the help text below.)

Finally, it can be useful to expand single tabs to multiple spaces in a column-aligned manner, for e.g. easier text reading/editing. Not all systems provide a nice way to do this, so PLINK 1.9 is distributed with the prettify utility for the job.

[chrchang:~/plink-ng]$ prettify
prettify v1.04 (21 Feb 2014)   Christopher Chang (chrchang@alumni.caltech.edu)

Takes a tab-and/or-space-delimited text table, and generates a space-delimited
pretty-printed version.  Multibyte character encodings are not currently
supported.

  prettify {flag(s)...} [input filename] {output filename}

  -i, --inplace      : Replace the input instead of writing to a new file.
  -s, --spacing [ct] : Set number of spaces between columns (default 2).
  -r, --ralign       : Make right sides of columns line up, instead of left.
  -l, --leading      : Add space(s) before the first column.
  -e, --extend-short : Use spaces to extend lines with fewer columns.
  -t, --trailing     : Add space(s) after the last column.
  -f, --force-eoln   : Force last line to be terminated by a newline.
  -n, --noblank      : Remove blank lines.

If no output filename is provided (and --inplace isn't in effect), results are
dumped to standard output.

To perform the simplest reverse conversion (multiple spaces to one tab), you
can use
  cat [input filename] | tr -s ' ' '\t' > [output filename]
For one-to-one conversion between spaces and tabs, omit the "-s". And
to strip leading and trailing tabs and spaces, try
  cat [in] | sed 's/^[[:space:]]*//g' | sed 's/[[:space:]]*$//g' > [out]
[chrchang:~/plink-ng]$ prettify -r plink.genome.tabs plink.genome2

It is not actually necessary to first convert spaces to tabs if you just wish to clean up a misaligned space-delimited file.

[chrchang:~/plink-ng]$ prettify -ir plink.genome
[chrchang:~/plink-ng]$ diff plink.genome plink.genome2
[chrchang:~/plink-ng]$

As with many other Unix programs, '-ir' is acceptable shorthand for "-i -r".

1: PLINK 1.07's pretty-printing logic is a bit buggy, but changing output formats can be less safe than just leaving things as they are. So, for now, we've decided to make it easy for you to realign reports on your own instead. (However, we plan to convert practically all functions over to tab-delimited output in PLINK 2.0.)

Flag/parameter reuse

--script <filename>

--script loads the specified text file and applies all the command-line flags and parameters contained within. This is handy if you use the same QC filters across multiple runs and datasets.

--rerun [log file]

--rerun loads the specified PLINK 1.9 log (defaulting to plink.log) and causes all commands to be rerun. The same parameter(s) will be used for each flag, except when the same flag is included on the current command line with different parameter(s).

Version information

--version

--version causes PLINK to only print its version number before exiting.

Console output suppression

--silent
--gplink

--silent prevents PLINK from printing regular output to the console. (The usual logging will still occur, and error-output is not suppressed.)

--gplink currently has a similar effect, but it should only be used by gPLINK. (If gPLINK is updated in the future, its developers may change this flag's behavior.)

System resource usage

--memory <main workspace size, in MB>

By default, PLINK 1.9 tries to reserve half of your system's RAM for its main workspace. If this amount is insufficient for your current job, or if it causes unwanted interference with other running processes (e.g. you're using GNU parallel to run single-threaded instances of PLINK on each chromosome simultaneously), you can use --memory to adjust this behavior.

32-bit PLINK limits workspace size to roughly 2 GB.

There are a few items (most notably, multi-character allele names) which are saved outside the main workspace. As a result, there are corner cases where decreasing the --memory parameter may enable a run to complete. (This situation is unlikely, since PLINK 1.9 explicitly reserves 64 MB of non-workspace memory.)

--threads <max>
  (aliases: --thread-num, --num_threads)

By default, multithreaded PLINK functions employ about as many concurrent threads as your system has available logical cores. (More precisely, PLINK currently sets the maximum thread count to sysconf(_SC_NPROCESSORS_ONLN), minus 1 if that number is greater than 8. This is a bit arbitrary, but we've found it to work well in practice so far.) Occasionally, you'll want to change this number—perhaps sysconf() is reporting an inaccurate number (not uncommon with AMD processors), or some of your cores are already fully occupied with other tasks. This can be done with --threads.

--threads has one known limitation: some BLAS/LAPACK linear algebra operations are multithreaded in a way that PLINK cannot control. If this is problematic, you should recompile against single-threaded BLAS/LAPACK.

Name range delimiter

--d <delimiter>

By default, PLINK commands accepting multiple name ranges (e.g. --snps, --covar-name, --lasso-select-covars, --ld-snps) expect ranges to be denoted with a single dash, with no space on either side of the dash. E.g. in

--snps rs1111-rs2222, rs3333, rs4444

'rs1111-rs2222' denotes all variants between rs1111 and rs2222 inclusive. --d lets you designate a non-dash character for this purpose, which can be essential if your IDs contain dashes. E.g.

--d : --snps SNP_A-8395068:SNP_A-8303431

tells --snps to act on all variants betwen SNP_A-8395068 and SNP_A-8303431 inclusive.

Reproducible pseudorandom number sequences

--seed <integer...>

--perm-batch-size <value>

--seed initializes the pseudorandom number generator with the given seed(s). Each seed must be a 32-bit unsigned integer (i.e. between 0 and 4294967295 inclusive).

When performing a permutation test on a quantitative trait, using --linear/--logistic, or conducting a set-based test, --perm-batch-size sets the number of permutations in each pass. (The current default is 512 across all systems, but we may vary it in a system- and/or dataset-dependent fashion in the future for performance reasons.) Due to the technical details of how PLINK generates permutations when employing multiple threads, you may need to use --perm-batch-size, --threads, and --seed together to ensure reproducible results. (For case/control --assoc/--model permutation tests, --threads + --seed is currently adequate.)

Note that you may also need to retrieve an older version of PLINK in order to reproduce a run.

Faster but less reproducible linear algebra

--native

By default, when the same plink binary is run with the same flags, workspace size, thread count, and random seed, the results should be reproducible across machines with different processors. (This was not necessarily true on Linux before 19 Oct 2020.) To allow Intel MKL to use processor-dependent code paths that can yield slightly different linear algebra results, add the --native flag.

P-value underflow

--output-min-p <threshold>

By default, p-values too small to be represented by an ordinary floating-point number are reported as '0' or 'INF'. This can create problems for log(p) plots and the like. One workaround is --output-min-p, which prevents PLINK from reporting non-empirical p-values below the given threshold. (Other reported statistics are not affected, so you can e.g. infer the true p-value from the reported Z-statistic.)

Reliable logging

--debug

Normally, PLINK 1.9 does not force log entries to be written to disk immediately. However, when PLINK crashes unexpectedly (e.g. via segmentation fault), this may cause the log to be incomplete. --debug prevents this from happening.

Redundant flags

The following PLINK 1.07 flags have been retired, since they are redundant with omnipresent utilities. (Talk to your system administrator if no programs for handling these operations appear to be installed on your machine.)

--compress (use e.g. "gzip <filename>")
--decompress (use e.g. "gunzip <filename>")
--id-dict/--id-match (free database software handles this in a more flexible and powerful manner)

Resources >>