Introduction, downloads

D: 18 Mar 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Google groups

Credits

File formats

Quick index search

Errors and warnings

When PLINK detects that something is nonstandard and/or wrong, it will usually display and log a message to that effect. In order of increasing severity, there are three classes of such messages: 'Note', 'Warning', and 'Error'.

Notes address situations where nothing is actually wrong, but there's something PLINK thought you might want to know. Common Notes include:

  • "--xyz flag deprecated. Use ..."
    When this is a Note, it indicates that the interface for a PLINK 1.x command you're using has been redesigned, and the new interface probably exposes some handy additional options, but you are free to continue doing things the way you always have. There are no plans to drop backwards compatibility in PLINK 2.0.
  • "No phenotype data present."
    Not a problem if you aren't performing any association analysis, or if you're explicitly loading phenotype data with --pheno when necessary.

A Warning indicates that something is likely to be wrong, but it's not fatal. (Unless you want it to be fatal; see --warning-errcode.) Common Warnings include:

  • "--xyz flag deprecated. Use ..."
    When this is a Warning, it indicates that, although the flag still works for now, backward compatibility is likely to be dropped before PLINK 2.0 is complete. If you expect to reuse the same script next year, updating is recommended.
  • "No output requested. Exiting." (followed by basic usage information)
    This happens when you specified input file(s) but forgot to mention what should be done with them; figure out what it is that you forgot, and then rerun your command.
  • "--make-pgen input and output filenames match. Appending '~' to input filenames."
    Since PLINK 2 does not keep all input data in memory simultaneously, it's frequently necessary for it to rename input files when they conflict with output filenames; otherwise the following could happen:
    1. First block of input data loaded and filtered
    2. Filtered data written to new output file; input file is deleted in the process
    3. Attempt to load second block of input data fails; PLINK errors out, and worse, most of the input data has been lost
    PLINK 2 follows the convention, introduced by GNU Emacs, of using appended tilde characters to designate automatic backup files. (Note that these backup files are fair game for clobbering by future automatic backups; when you want them to serve as 'real' backups, rename them.)
  • "--merge-par should not be used with VCF export. (The VCF export routine automatically converts PAR1/PAR2 chromosome codes to X, while using the PAR boundaries to get male ploidy right; --merge-par causes VCF export to get male ploidy wrong.)"
    As advertised. For the same reason, when pseudoautosomal regions are present, and you have sex information, make sure chrX has been split with --split-par before VCF export.
  • "At least one VCF allele code violates the official specification; other tools may not accept the file. (Valid codes must either start with a '<', only contain characters in {A,C,G,T,N,a,c,g,t,n}, be an isolated '*', or represent a breakend.)"
    This can happen when you try to convert e.g. 23andMe files with nonstandard allele codes to VCF format. The usual solution is to filter out these variants with e.g. "--snps-only just-acgt". You can try to standardize some of the allele codes (see the "Fixed fields" subsection of the VCF specification for a description of the correct representation), but at least in the 23andMe case, it may not be possible to do this reliably with publicly available information.
  • "Variants are not sorted by position. Consider rerunning with the --sort-vars flag added to remedy this."
    Unlike previous versions, PLINK 2 does not default to sorting all variants by position when generating a new fileset. This significantly speeds up some simple workflows. However, sorting is still a good idea more often than not (quite a few operations do still require sorted variant records, after all), so PLINK 2 nags you about it.
  • "--hwe observation counts vary by more than 10%. Consider using --geno, and/or applying different p-value thresholds to distinct subsets of your data."
    This is discussed in the --hwe documentation.
  • "Skipping --glm regression on phenotype 'PHENO1', since genotype/covariate scales vary too widely for numerical stability of the current implementation. Try rescaling your covariates with e.g. --covar-variance-standardize."
    "Year of birth" and similar covariates had a nasty habit of causing PLINK 1.x --linear/--logistic jobs to produce a flood of 'NA' results: they interacted poorly with both built-in multicollinearity diagnostics and the actual linear algebra operations used to solve the regression. Fortunately, a simple linear rescaling of the covariates was enough to make the problem go away. --glm detects this scenario and tells you about the rescaling solution up-front.
    (Yes, --glm could be modified to automatically variance-standardize these covariates, and then automatically convert the results back to the original scale. However, this gets complicated when interaction testing comes into play. We may eventually implement this, but we think it is best to stick to the current simpler and good-enough solution until all other major PLINK 2 features are working.)
    This warning, along with the next several --glm phenotype-skipping warnings, will be upgraded to errors in alpha 3.
  • "Skipping chrX in --glm regression on phenotype 'PHENO1', since correlation between covariates 'SEX' and 'Sex' is too high (CORR_TOO_HIGH)."
    By default, --glm includes sex (from the .fam/.psam file) as a covariate on chrX, and nowhere else. This is problematic if you're adding your own sex covariate from a --covar file. See the --glm documentation on the 'sex' and 'no-x-sex' modifiers.
  • "Skipping --glm regression on phenotype 'PHENO1', since correlation between covariates 'xyz' and 'abc' is too high (CORR_TOO_HIGH). You may want to remove redundant covariates and try again."
    "Skipping --glm regression on phenotype 'PHENO1', since covariate correlation matrix could not be inverted (VIF_INFINITE). You may want to remove redundant covariates and try again."
    These tell you about linear dependence in your covariate set.
    • The first of these messages is straightforward to address: if both covariates in question are sex, see the link above about --glm's sex/no-x-sex modifiers, otherwise just remove one of the named covariates.
    • The most common causes of the second message involve categorical covariates. In particular, when more than one categorical covariate is present, linear dependence is especially likely (e.g. 'batch' and 'center' are distinct categorical covariates, several batches were all genotyped at the same center, and no other samples were genotyped at that center); a workaround for that is fusing the categorical covariates into a single composite covariate (for this example, 'batch_center'). Another cause is a split categorical covariate where no category was omitted by the split; in that case, remove the largest category (this corresponds to "--split-cat-pheno omit-most"). Otherwise, you may need to do more work to identify and break the linear dependence; you can do this systematically with e.g. R, or just try removing less-important covariates.

Finally, an Error is a fatal problem that causes PLINK to terminate immediately. Common Errors include:

  • "No input dataset."
    The inverse of "No output requested"; rerun your command with the appropriate input flag(s). (This is classified as an Error instead of a Warning because the basic usage info message is less likely to help you fix the problem.)
  • "--xyz conflicts with another input flag."
    You specified multiple input flags of different types. PLINK requires you to name exactly one main input fileset; two or more is not allowed.
  • "Failed to open <filename> : No such file or directory."
    You probably mistyped a filename, included a file extension when you shouldn't have (e.g. --bfile), or failed to include a file extension when you should have (e.g. --vcf).
  • "Out of memory."
    This is most likely to happen with very large variant sets, lots of long, fully-spelled-out indels... and stupid mistakes on our part (e.g. we just verified you did have enough memory, but we forgot a 'not' in our code and did not have a test for that code branch). If you suspect the latter, post your .log file to the plink2-users Google group.
  • "No samples remaining after main filters."
    "No variants remaining after main filters."
    The filtering flags you specified caused every last sample or every last variant to be excluded from the analysis. Check for things done backwards (e.g. --extract where --exclude was intended), mismatched IDs (e.g. rsIDs in your main dataset and position-based IDs in the --extract file, or vice versa, or different position-based ID schemes... --set-all-var-ids is likely to be helpful in the aforementioned cases), and mistyped thresholds.
  • "Unrecognized flag ('--flag-recognized-by-plink1.9')."
    PLINK 2.0 won't be complete for a while to come. If possible, just use PLINK 1.9 for the affected operation (after exporting to PLINK 1 binary format if necessary) for now. If that would not yield a sufficiently accurate result, it is reasonable to describe your use case to the plink2-users Google group; no guarantees, but we try to prioritize features that we know researchers are waiting on so your comment won't be ignored.

Output file list >>