Introduction, downloads

D: 28 Oct 2018

Recent version history

What's new?

Coming next

General usage

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-{b}pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

File format reference

This page describes specialized PLINK 2.0 input and output file formats which are identifiable by file extension. (Most extensions not listed here have very simple one-entry-per-line or two-entry-per-line text formats.)

Unless otherwise specified, all multicolumn text files generated by PLINK 2.0 are tab-delimited, with one header line starting with '#'. In the column summaries, columns which are present unless removed by the column set descriptor are bolded, and columns which only appear under some flag/modifier combination(s) are italicized.

Jump to: .acount | .adjusted | .afreq | .bed | .bgen | .bim | .bins | .cov | .eigenvec | .fam | .gcount | .gen | .glm.firth | .glm.linear | .glm.logistic{.hybrid} | .grm | .grm.N.bin | .grm.bin | .haps | .hardy | .hardy.x | .*.id | .kin0 | .king | .legend | .pgen | .psam | .pvar | .raw | .rel | .sample | .sdiff | .sdiff.summary | .smiss | .sscore | .traw | .vcf | .vmiss


.acount, .afreq (allele count/frequency report)

Produced by --freq.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
'REF_FREQ'/'REF_CT'reffreqReference allele frequency/dosage
'ALT1_FREQ'/'ALT1_CT'alt1freqAlternate allele 1 frequency/dosage
'ALT_FREQS'/'ALT_CTS'altfreq, alteq, alteqzComma-sep. freqs/dosages for all alts
'ALT_NUM_{FREQS,CTS}'altnumeqComma-sep. freqs/dosages for all alts
'FREQS'/'CTS'freq, eq, eqzComma-sep. freqs/dosages for all alleles
'NUM_FREQS'/'NUM_CTS'numeqComma-sep. freqs/dosages for all alleles
'MACH_R2'machr2MaCH imputation quality metric
OBS_CTnobsNumber of allele observations

.adjusted (basic multiple-testing corrections)

Produced by --adjust and --adjust-file.

A text file with a header line, and then one line per valid variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
A1a1Tested allele
UNADJunadjUnadjusted p-value
GCgcDevlin & Roeder (1999) genomic control corrected p-value (additive model only)
QQqqP-value quantile.
BONFbonfBonferroni correction
HOLMholmHolm-Bonferroni (1979) adjusted p-value
SIDAK_SSsidakssŠidák single-step adjusted p-value
SIDAK_SDsidaksdŠidák step-down adjusted p-value
FDR_BHfdrbhBenjamini & Hochberg (1995) step-up false discovery control
FDR_BYfdrbyBenjamini & Yekutieli (2001) step-up false discovery control

Variants are sorted in p-value order. (Thus, if the QQ field is present, its values just increase linearly.)

.bed (PLINK 1 binary biallelic genotype table)

PLINK 1's preferred way to represent genotype calls. Must be accompanied by .bim and .fam files. Loaded with --bfile, and generated by --make-bed.

Do not confuse this with the UCSC Genome Browser's BED format, which is totally different. (It is safe to change a PLINK 1 .bed file's extension to .pgen and use --bpfile to load it.)

See the PLINK 1.9 documentation for a detailed description of the usual variant-major form, along with an example. PLINK 2 can also efficiently export the sample-major form ("--export ind-major-bed"); it has third byte equal to zero instead of one, but is otherwise analogous.


.bgen (Oxford variant info + genomic data binary file)

Native binary file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. BGEN v1.1 files should always be accompanied by a .sample file. Loaded with --bgen, and produced by "--export bgen-1.1", "--export bgen-1.2", and "--export bgen-1.3".

Refer to http://www.well.ox.ac.uk/~gav/bgen_format/ for a detailed description of the format.


.bim (PLINK extended MAP file)

Variant information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-bim can be used to update just this file.)

A text file with no header line, and one line per variant with the following six fields:

  1. Chromosome code
  2. Variant ID
  3. Position in centimorgans (safe to use dummy value of '0')
  4. Base-pair coordinate (1-based; limited to 231-2)
  5. ALT allele code
  6. REF allele code

A few notes:

  • When .bed files are involved, the ALT and REF allele codes will sometimes be swapped, since that's PLINK 1.x's default behavior whenever the true REF allele is less common than the ALT allele in the current dataset. If that's a problem, you can use --ref-allele to swap them back.
  • It is safe to change a .bim file's extension to .pvar and use --pfile to load it.
  • Variants with negative bp coordinates are ignored by PLINK.
  • PLINK 1.9 and 2.0 permit the centimorgan column to be omitted. (However, omission is not recommended if the .bim file needs to be read by other software.)

.bins (allele count or frequency histogram)

A text file with a header line, followed by one line per [start, end) histogram bin with the following two fields:

HeaderContents
BIN_STARTStart of bin
OBS_CTNumber of variants in the bin

The end of the current bin interval is the next line's BIN_START value (or positive infinity if there is no next line).

.cov (covariate table)

Produced by --write-covar, --make-{b}pgen/--make-bed, and --export when covariates have been loaded/specified. Valid input for --covar.

A text file with a header line, and one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
PATmaybeparents, parentsPaternal individual ID
MATmaybeparents, parentsMaternal individual ID
SEXsexSex (1 = male, 2 = female, 'NA' = unknown)
PHENO1pheno1All-missing phenotype column, if none loaded
[Pheno name], ...pheno1, phenosPhenotype value(s) (only first if just 'pheno1')
[Covar name], ...(required)Covariate values

(Note that --covar can also be used with files lacking a header row.)

.eigenvec, .eigenvec.var (principal components)

Produced by --pca. Accompanied by an .eigenval file, which contains one eigenvalue per line.

The .eigenvec file is a text file with a header line and between 1+V and 3+V columns per sample, where V is the number of requested principal components. The first columns contain the sample ID, and the rest are principal component weights in the same order as the .eigenval values (with column headers 'PC1', 'PC2', ...).

With the 'var-wts' modifier, an .eigenvec.var file is also generated. It's a text file with a header line, followed by one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
MAJmajMajor allele
NONMAJnonmajAll nonmajor alleles, comma separated
PC1, PC2, ...(required)Principal component variant weights; signs are w.r.t. the major allele

.fam (PLINK sample information file)

Sample information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-fam can be used to update just this file.)

A text file with no header line, and one line per sample with the following six fields:

  1. Family ID ('FID')
  2. Individual ID ('IID'; cannot be '0')
  3. Individual ID of father ('0' if father isn't in dataset)
  4. Individual ID of mother ('0' if mother isn't in dataset)
  5. Sex code ('1' = male, '2' = female, '0' = unknown)
  6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)

.gcount (genotype count report)

Produced by --geno-counts.

A text file with a header line, and then one line per variant with the following columns:

HeaderCol. setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
HOM_REF_CThomrefHomozygous-ref count
HET_REF_ALT1_CTrefalt1Heterozygous ref-alt1 count
HET_REF_ALT_CTSrefaltComma-separated het ref-altx counts
HOM_ALT1_CThomalt1Homozygous-alt1 count
TWO_ALT_GENO_CTSaltxyComma-separated altx-alty counts, in (1/1)-(1/2)-(2/2)-(1/3)-... order
DIPLOID_GENO_CTSxySimilar to altxy, except reference allele included
HAP_REF_CThaprefHaploid-ref count
HAP_ALT1_CThapalt1Haploid-alt1 count
HAP_ALT_CTShapaltComma-separated haploid-altx counts
HAP_CTShapSimilar to hapalt, except ref also included
GENO_NUM_CTSnumeq"0/0=[hom ref ct],0/1=[het ref-alt1],...,0=[hap ref]" etc.; zero-counts are omitted
MISSING_CTmissingNumber of missing genotypes
OBS_CTnobsNumber of (nonmissing) genotype observations

.gen (Oxford text genotype file format)

Native text genotype file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. Should always be accompanied by a .sample file. Imported with --data/--gen, and produced by "--export oxford".

A text file with no header line, and one line per variant with 3N+5 fields where N is the number of samples. Each line stores information for a single SNP. The first five fields are:

  1. Chromosome code (can be ignored with --oxford-single-chr)
  2. Variant ID
  3. Base-pair coordinate
  4. Allele 1 (usually minor, use 'ref-first' when importing to treat as REF)
  5. Allele 2 (usually major, use 'ref-last' when importing to treat as REF)

Each subsequent triplet of values then indicate likelihoods of homozygote A1, heterozygote, and homozygote A2 genotypes at this variant, respectively, for one sample. If they add up to less than one, the remainder is a no-call probability weight.

The PLINK 2 binary format can represent allele count expected values, but it does not distinguish between e.g. {P(hom-ref)=0.28, P(het)=0.52, P(hom-alt)=0.2} and {P(hom-ref)=0.08, P(het)=0.92, P(hom-alt)=0}, and it ignores the no-call probability weight (though "0 0 0" will be correctly converted to a missing call). The --import-dosage-certainty flag can be used during import to replace some of the most uncertain genotype calls with missing values.


.glm.firth, .glm.logistic, .glm.logistic.hybrid (logistic/Firth regression association statistics)

Produced by --glm with a case/control phenotype.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
A1(required)Counted allele1 in regression
AXaxNon-A1 alleles, comma-separated
A1_CTa1countTotal A1 allele count (can be decimal with dosage data)
ALLELE_CTtotalleleAllele observation count
A1_CASE_CTa1countccA1 count in cases
A1_CTRL_CTa1countccA1 count in controls
CASE_ALLELE_CTtotalleleccCase allele observation count
CTRL_ALLELE_CTtotalleleccControl allele observation count
CASE_NON_A1_CTgcountccCase genotypes with 0 copies of A1
CASE_HET_A1_CTgcountccCase genotypes with 1 copy of A1
CASE_HOM_A1_CTgcountccCase genotypes with 2 copies of A1
CTRL_NON_A1_CTgcountccControl genotypes with 0 copies of A1
CTRL_HET_A1_CTgcountccControl genotypes with 1 copy of A1
CTRL_HOM_A1_CTgcountccControl genotypes with 2 copies of A1
A1_FREQa1freqA1 allele frequency
A1_CASE_FREQa1freqccA1 allele frequency in cases
A1_CTRL_FREQa1freqccA1 allele frequency in controls
MACH_R2machr2MaCH imputation quality metric
FIRTH?firthReports whether Firth reg. was used ('firth-fallback' only)
TESTtestTest identifier
OBS_CTnobsNumber of samples in the regression
BETAbetaRegression coefficient (for A1 allele)
ORorbetaOdds ratio (for A1 allele)
SEseStandard error of log-odds (i.e. beta)
L##ciBottom of symmetric approx. confidence interval (with --ci)
U##ciTop of symmetric approx. confidence interval (with --ci)
Z_{,OR_CHISQ_}STATtzChi-square stat for joint test, Wald Z-score for logistic/Firth regression
PpAsymptotic p-value (or -log10(p)) for Z/chisq-stat

.glm.linear (linear regression association statistics)

Produced by --glm with a quantitative phenotype.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
A1(required)Counted allele1 in regression
AXaxNon-A1 alleles, comma-separated
A1_CTa1countTotal A1 allele count (can be decimal with dosage data)
ALLELE_CTtotalleleAllele observation count
A1_FREQa1freqA1 allele frequency
MACH_R2machr2MaCH imputation quality metric
TESTtestTest identifier
OBS_CTnobsNumber of samples in the regression
BETAbeta, orbetaRegression coefficient (for A1 allele)
SEseStandard error of log-odds (i.e. beta)
L##ciBottom of symmetric approx. confidence interval (with --ci)
U##ciTop of symmetric approx. confidence interval (with --ci)
T_{,OR_CHISQ_}STATtzChi-square stat for joint test; t-statistic for linear regression
PpAsymptotic p-value (or -log10(p)) for T/chisq-stat

1: For multiallelic variants, this column may contain multiple comma-separated alleles when the result doesn't depend on which allele is A1.


.grm (GCTA text relationship matrix)

Produced by --make-grm-gz.

A text file with no header line, and one line per pair of samples (not necessarily distinct) with the following four fields:

  1. 1-based index of first sample in .grm.id file
  2. 1-based index of second sample in .grm.id file
  3. Number of observations (variants where neither sample has a missing call)
  4. Relationship value

.grm.N.bin, .grm.bin (GCTA 1.1+ triangular binary relationship matrix)

Produced by --make-grm-bin.

These files contain single-precision (4-byte) floating point values. Using 1-based matrix indices, the first value in each file is the (1, 1) relationship value (.grm.bin) or observation count (.grm.N.bin); the second and third values are the (2, 1) and (2, 2) relationships/counts; the fourth through sixth values are the (3, 1), (3, 2) and (3, 3) relationships/counts in that order; and so on.

Note that .grm.bin files generated by GCTA versions before 1.1 have a different format.


.haps (Oxford phased haplotype file)

Reference panel haplotype file format for IMPUTE2. Must be accompanied by a .legend file when no variant info header columns are present. Imported with --haps, and produced by "--export haps"/"--export hapslegend".

A text file with no header line, and either 2N+5 or 2N fields where N is the number of samples. In the former case, the first five columns are:

  1. Chromosome code
  2. Variant ID
  3. Base-pair coordinate
  4. Allele 0 (usually minor, use 'ref-first' when importing to treat as REF)
  5. Allele 1 (usually major, use 'ref-last' when importing to treat as REF)

This is followed by a pair of 0/1-valued haplotype columns for the first sample, then a pair of haplotype columns for the second sample, etc. (For male samples on chrX, the second column may contain dummy '-' entries.)


.hardy (Hardy-Weinberg equilibrium exact test report)

Produced by --hardy when autosomal diploid variants are present.

A text file with a header line, and one line per autosomal diploid variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
A1(required)Tested allele
AXaxNon-A1 alleles, comma-separated
HOM_A1_CTgcountsHomozygous-A1 genotype count
HET_A1_CTgcountsHeterozygous-A1 genotype count
TWO_AX_CTgcounts# of nonmissing calls with no A1 copies
GCOUNTSgcount1colgcounts values in a single comma-separated column
O(HET_A1)hetfreqObserved heterozygous-major frequency
E(HET_A1)hetfreqExpected heterozygous-major frequency
'P'/'MIDP'pHardy-Weinberg equilibrium exact test (mid)p-value

.hardy.x (Graffelman-Weir extended chrX HWE test report)

Produced by --hardy when chrX variants are present.

A text file with a header line, and one line per chrX variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
A1(required)Tested allele
AXaxNon-A1 alleles, comma-separated
FEMALE_HOM_A1_CTgcountsFemale homozygous-A1 genotype count
FEMALE_HET_A1_CTgcountsFemale heterozygous-A1 genotype count
FEMALE_TWO_AX_CTgcounts# of nonmissing female calls with no A1 copies
MALE_A1_CTgcountsMale A1 allele count
MALE_AX_CTgcountsMale non-A1 allele count
GCOUNTSgcount1colgcounts values in a single comma-separated column
O(FEMALE_HET_A1)hetfreqObserved het-A1 frequency
E(FEMALE_HET_A1)hetfreqExpected het-A1 frequency
FEMALE_A1_FREQsexafFemale A1 allele frequency
MALE_A1_FREQsexafMale A1 allele frequency
FEMALE_ONLY_(MID)PfemalepOld female-only HWE exact test (mid)p-value
'P'/'MIDP'pGraffelman-Weir HWE test (mid)p-value

.id (Sample ID list)

When generated by PLINK 2, this is a text file which may or may not have a header line. If there's no header line (default with .grm.id files, can be forced for other .id files with --no-id-header), and there's a single column, they are IIDs; if there are two columns, they are FID/IID. Otherwise, there's one line per sample after the header line with the following columns:

HeaderContents
FIDFamily ID (present iff .psam or --update-ids file has it)
IIDIndividual ID (always present)
SIDSource ID (present iff .psam or --update-ids file has it)
.kin0 (KING-robust kinship coefficient report)

Produced by --make-king-table.

A text file with a header line, and one line per sample pair with kinship coefficient no smaller than the --king-table-filter value. When --king-table-filter is not specified, all sample pairs are included. The following columns are present:

HeaderColumn setContents
FID1maybefid, fidFID of first sample in current pair
ID1idIID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
ID2idIID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
NSNPnsnpNumber of variants considered (autosomal, neither call missing)
HETHEThethetProportion/count of considered call pairs which are het-het
IBS0ibs0Proportion/count of considered call pairs which are opposite homs
HET1_HOM2ibs1Proportion/count of sample 1 het, sample 2 hom
HET2_HOM1ibs1Proportion/count of sample 1 hom, sample 2 het
KINSHIPkinshipKING-robust between-family kinship estimate

.king (KING-robust text kinship coefficient matrix)

Produced by --make-king.

A tab-delimited file that is either lower-triangular (excluding the diagonal) or square. If square, the upper-right triangle may be either zeroed out or the mirror-image of the lower-left triangle, depending on whether the 'square0' or 'square' modifier was used.


.legend (Oxford single-chromosome variant information file)

Single-chromosome variant information file accompanying a bare .haps reference panel haplotype file. Imported with --legend, and produced by "--export hapslegend".

A text file with a header line, and one line per variant with the following four columns:

HeaderContents
idVariant ID
positionBase-pair coordinate
a0Allele 0 (usually minor, use 'ref-first' to treat as REF)
a1Allele 1 (usually major, use 'ref-last' to treat as REF)

.pgen (PLINK 2 binary genotype table)

PLINK 2's preferred way to represent genotype calls. Must be accompanied by .pvar/.bim and .psam/.fam files. Loaded with --pfile/--bpfile, and generated with --make-pgen/--make-bpgen and all import commands.

This starts with a magic number, followed by an index describing the positions and storage types of each variant in the file, followed by the actual variant records. It can be read in a sequential manner as long as the index can be kept in memory, but cannot be written purely sequentially if the storage types aren't all identical (since it's necessary to backfill the index).

A complete description will be provided with the first PLINK 2.0 beta release; until then, pgenlib_internal.h is the best documentation.


.psam (PLINK 2 sample information file)

Sample information file accompanying a .pgen binary genotype table. (--make-just-psam can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#FID' or '#IID'. This final header line specifies the columns in the .psam file; the following intermediate column headers are recognized:

  1. IID (individual ID; required)
  2. SID (source ID, when there are multiple samples for the same individual)
  3. PAT (individual ID of father, '0' if unknown)
  4. MAT (individual ID of mother, '0' if unknown)
  5. SEX ('1' = male, '2' = female, 'NA'/'0' = unknown)

(FID must either be the first column, or absent. If it's absent, all FID values are now assumed to be '0'.) Any other value is treated as a phenotype/covariate name.

If no header line is present, the columns are assumed to be in .fam file order (FID, IID, PAT, MAT, SEX, PHENO1).


.pvar (PLINK 2 variant information file)

Variant information file accompanying a .pgen binary genotype table. (--make-just-pvar can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#CHROM'. This final header line specifies the columns in the .pvar file; the following intermediate column headers are recognized:

  1. POS (base-pair coordinate)
  2. ID (variant ID; required)
  3. REF (reference allele)
  4. ALT (alternate alleles, comma-separated)
  5. QUAL (phred-scaled quality score for whether the locus is variable at all)
  6. FILTER ('PASS', '.', or semicolon-separated list of failing filter codes)
  7. INFO (semicolon-separated list of flags and key-value pairs, with types declared in header)
  8. FORMAT (terminates header line parsing)
  9. CM (centimorgan position)

In particular, a VCF file, or a trimmed VCF file with all columns past the 5th (or 6th, etc.) removed, is valid input for anything expecting a .pvar-format file.

The following VCF-style header lines are also recognized:

  1. "##INFO=<ID=PR,Number=0,Type=Flag...": Indicates the INFO:PR flag, which marks 'provisional' reference alleles (i.e. imported from a file which does not consistently track which allele is reference and which are alternates), is present. (This information is also present in .pgen files, and the loader reports an error when the .pvar and .pgen flags don't match.)
  2. "##chrSet=...": Explicitly specifies the chromosome set. E.g. --make-pgen + --dog will cause "##chrSet=<autosomePairCt=38,X,Y,XY,M>" to be written to the .pvar header, and as a consequence it isn't necessary to include the --dog flag when loading the new fileset.

When no header line is present, the columns are assumed to be in .bim file order (CHROM, ID, CM, POS, ALT, REF; or if only 5 columns are present, CM is assumed to be omitted).


.raw (additive + dominant component file)

Produced by "--export A" and "--export AD"; suitable for loading from R. This format cannot be loaded by PLINK.

A text file with a header line, and then one line per sample with V+6 (for "--export A") or 2V+6 (for "--export AD") fields, where V is the number of variants. The header line does not contain a preceding '#'. The first six fields are:

FIDFamily ID
IIDIndividual ID
PATPaternal individual ID
MATMaternal individual ID
SEXSex (1 = male, 2 = female, 0 = unknown)
PHENOTYPEFirst active non-categorical phenotype (missing value if none)

This is followed by one or two fields per variant:

[Variant ID]_[counted allele]Allelic dosage (missing = 'NA', haploid scaled to 0..2)
[Variant ID]_HETDominant component (1 = het). Requires "--export AD".

If 'include-alt' was specified, the header line also names alternate allele codes in parentheses, e.g. 'rs5939319_G(/A)'.

.rel (text relationship matrix)

Produced by --make-rel.

Contents are identical to that of a .grm/.grm.bin file. Possible shapes are essentially the same as for .king files; the only difference is that .king files have an omitted or constant-0.5 diagonal while .rel files do not.


.sample (Oxford sample information file)

Sample information file accompanying a .gen or .bgen genotype dosage file, or a .haps phased reference panel. Loaded with --data/--sample, and produced by --export in several cases.

The .sample space-delimited files emitted by --export have two header lines, and then one line per sample with 4+ fields:

First header lineSecond header lineSubsequent contents
ID_10Family ID
ID_20Individual ID
missing0Missing call frequency
sexDSex code ('1' = male, '2' = female, '0' = unknown)
[Pheno name], ...'B'/'D'/'P'Binary ('0' = control, '1' = case), discrete (categorical, positive integers), or continuous phenotype; missing values represented by 'NA'

The full specification for this format is on the Oxford statistical genetics website.


.sdiff (sample-pair discordance report)

Produced by --sdiff.

A text file with a header line, and then one line per discordance with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALTaltAll alternate alleles, comma-separated
FID1maybefid, fidFID of first sample in current pair
IID1idIID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
IID2idIID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
'GT1'/'DS1'genoGenotype/dosage of first sample
'GT2'/'DS2'genoGenotype/dosage of second sample
.sdiff.summary (sample-pair discordance count summary)

Produced by --sdiff.

A text file with a header line, and then one line per sample pair with the following columns:

FID1maybefid, fidFID of first sample in current pair
IID1(required)IID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
IID2(required)IID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
OBS_CTnobsNumber of genotype/dosage pairs considered
IBS_OBS_CTnobsibsNumber of diploid hardcall-pairs
IBS0_CTibs0# of diploid hardcall-pairs with no matching alleles
IBS1_CTibs1# of diploid hardcall-pairs with exactly 1 matching allele
IBS2_CTibs2# of diploid hardcall-pairs with 2 matching alleles
HALFMISS_CThalfmiss# of genotype/dosage pairs with exactly 1 missing call
DIFF_CTdiff# of genotype/dosage discordances
.smiss (sample-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
MISS_PHENO1misspheno1First active phenotype missing (Y/N), Y if none
[Pheno name], ...missphenosY/N column for each loaded phenotype
MISSING_DOSAGE_CTnmissdosageNumber of missing dosages
MISSING_CTnmissNumber of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CTnmisshhNumber of missing hardcalls, counting het haploids
HETHAP_CThethapNumber of heterozygous haploid hardcalls
OBS_CTnobsDenominator (# males on chrY, otherwise # samples)
F_MISS_DOSAGEfmissdosageMissing dosage rate
F_MISSfmissMissing hardcall rate, not counting het haploids
F_MISS_AND_HETHAPfmisshhMissing hardcall rate, counting het haploids

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).


.sscore (sample scores)

Produced by --score.

A text file with a header line, and then one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
PHENO1pheno1All-missing phenotype column, if none loaded
[Pheno name], ...pheno1, phenosPhenotype value(s) (only first if just 'pheno1')
NMISS_ALLELE_CTnmissalleleNumber of nonmissing alleles
DENOMdenomDenominator used for score average
NAMED_ALLELE_DOSAGE_SUMdosagesumSum of named allele dosages
[Score name]_AVG, ...scoreavgsScore averages
[Score name]_SUM, ...scoresumsScore sums
.traw (variant-major additive component file)

Produced by "--export A-transpose"; suitable for loading from R. Loaded with --import-dosage.

A text file with a header line without a leading '#', and then one line per variant with the following N+6 fields (where N is the number of samples):

CHRChromosome code
SNPVariant identifier
(C)MPosition in morgans or centimorgans
POSBase-pair coordinate
COUNTEDCounted allele (now defaults to REF)
ALTOther allele(s), comma-separated
[FID]_[IID]...Allelic dosages (missing = 'NA', haploid scaled to 0..2)
.vcf (1000 Genomes Project text Variant Call Format)

Variant information + sample ID + genotype call text file. Imported with --vcf, and produced by "--export vcf".

Note that, while PLINK 2.0 supports a much larger subset of the VCF standard than PLINK 1.9, it still isn't appropriate for general-purpose VCF handling. Instead, the goal is to provide a very useful complement to bcftools. For example, PLINK 2.0 does not save per-call read depths, so any data management or analysis which requires them to be kept around should be done with bcftools or a similarly general tool; but once you're done with variant calling/imputation and are ready to treat your data as a single matrix of hardcalls or dosages with missing entries, PLINK 2.0 is much more efficient.

The VCFv4.3 files emitted by --export start with the following three header lines:

  1. ##fileformat=VCFv4.3
  2. ##fileDate=[yyyymmdd date]
  3. ##source=PLINKv2.00

This is usually followed by all the VCF header lines (if any) present in the loaded .pvar file, a "##chrSet=" chromosome set description when appropriate, and additional "##contig="/INFO:PR/FORMAT:GT header lines when necessary to make the file conform to the VCF standard.

Next comes a tab-delimited header line with the following N+9 fields (where N is the number of samples), and one tab-delimited line per variant with the same fields:

#CHROMChromosome code
POSBase-pair coordinate
IDVariant identifier
REFReference allele (missing = 'N')
ALTAll alternate alleles, comma-separated (missing = '.')
QUALPhred-scaled quality score for whether the locus is variable at all
FILTER'PASS', '.', or semicolon-separated list of failing filter codes
INFOSemicolon-separated list of flags and key-value pairs, with types declared in header
FORMAT'GT' (signaling the presence of genotype calls)
[Sample ID], ...Genotype calls ('/'-separated if diploid, 0=ref, 1=alt, '.'=missing)

Allele codes are supposed to either start with '<', only contain characters in the set {A,C,G,T,N,a,c,g,t,n}, or represent a breakend. --export issues a warning if an allele code does not satisfy this restriction.

The full VCFv4.3 specification is in the hts-specs GitHub repository.


.vmiss (sample-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
MISSING_DOSAGE_CTnmissdosageNumber of missing dosages
MISSING_CTnmissNumber of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CTnmisshhNumber of missing hardcalls, counting het haploids
HETHAP_CThethapNumber of heterozygous haploid hardcalls
OBS_CTnobsDenominator (# variants for males, otherwise excludes chrY)
F_MISS_DOSAGEfmissdosageMissing dosage rate
F_MISSfmissMissing hardcall rate, not counting het haploids
F_MISS_AND_HETHAPfmisshhMissing hardcall rate, counting het haploids
F_HETHAPfhethapHeterozygous haploid rate.

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).