File formats

Introduction, downloads

D: 7 Jul 2025

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PROVISIONAL_REF?

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

EIGENSOFT binary

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Same-indiv selection

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Mendel errors

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-me-missing

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--mendel

--het

--check-sex/--impute-sex

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

(--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Discussion forums

Credits

File formats

Tutorials

Setup

Rules of Thumb

Data Exploration 1 — HWE, Allele Frequency Spectrum

Data Exploration 2 — Genomic Structure

Linkage

Relationship Matrix

Genome-Wide Assocation Analyses (GWAS)

Regressions

bcftools

Quick index search

File format reference

This page describes specialized PLINK 2.0 input and output file formats which are identifiable by file extension. (Most extensions not listed here have very simple one-entry-per-line or two-entry-per-line text formats.)

Unless otherwise specified, all multicolumn text files generated by PLINK 2.0 are tab-delimited, with one header line starting with '#'. In the column summaries, columns which are present unless removed by the column set descriptor are boldface, and columns which only appear under some data/flag/modifier combination(s) are italicized.

Jump to: .acount | .adjusted | .afreq | .allele.no.snp | .bcf | .bed | .bgen | .bim | .bins | .clumps | .cov | .eigenvec{,.allele|.var} | .fam | .fst.summary | .fst.var | .gcount | .gen | .geno | .glm.firth | .glm.linear | .glm.logistic[.hybrid] | .grm | .grm.N.bin | .grm.bin | .haps | .hardy | .hardy.x | .het | .*.id | .ind | .kin0 | .king[.bin] | .legend | .map | .*mendel | .pdiff | .ped | .pgen{,.pgi} | .phy | .psam | .pvar | .raw | .rel[.bin] | .sample | .scount | .sdiff | .sdiff.summary | .sexcheck | .smiss | .snp | .sscore | .ssf.tsv | .svd.pheno | .svd.pheno_wts | .tfam | .tped | .traw | .vcf | .vcor | .vcor{1|2}[.bin] | .vmiss | .vscore | .vscore.bin

.acount, .afreq (allele count/frequency report)

Produced by --freq.

A text file with a header line, and then one line per variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
'REF_FREQ'/'REF_CT'	reffreq	Reference allele frequency/dosage
'ALT1_FREQ'/'ALT1_CT'	alt1freq	Alternate allele 1 frequency/dosage
'ALT_FREQS'/'ALT_CTS'	altfreq, alteq, alteqz	Comma-separated freqs/dosages for all alts; 'eq' requests '1=<ALT1 value>,2=<ALT2 value>,...' formatting with zero-values omitted, 'eqz' includes zeroes
'ALT_NUM_{FREQS,CTS}'	altnumeq	Comma-separated freqs/dosages for all alts
'FREQS'/'CTS'	freq, eq, eqz	Comma-separated freqs/dosages for all alleles
'NUM_FREQS'/'NUM_CTS'	numeq	Comma-separated freqs/dosages for all alleles
MACH_R2	machr2	MaCH imputation quality metric
MINIMAC3_R2	minimac3r2	Minimac3 phased-dosage imputation quality metric; inaccurate unless phased dosages were imported with e.g. "--vcf dosage=HDS" (dosage=DS is not enough)
OBS_CT	nobs	Number of allele observations

.adjusted (basic multiple-testing corrections)

Produced by --adjust[-file].

A text file with a header line, and then one line per tested allele with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
A1	a1	Tested allele
[NEG_LOG10_]UNADJ	unadj	Unadjusted p-value
*[NEG_LOG10_]GC*	gc	Devlin & Roeder (1999) genomic control corrected p-value (additive model only)
QQ	qq	P-value quantile.
[NEG_LOG10_]BONF	bonf	Bonferroni correction
[NEG_LOG10_]HOLM	holm	Holm-Bonferroni (1979) adjusted p-value
[NEG_LOG10_]SIDAK_SS	sidakss	Šidák single-step adjusted p-value
[NEG_LOG10_]SIDAK_SD	sidaksd	Šidák step-down adjusted p-value
[NEG_LOG10_]FDR_BH	fdrbh	Benjamini & Hochberg (1995) step-up false discovery control
[NEG_LOG10_]FDR_BY	fdrby	Benjamini & Yekutieli (2001) step-up false discovery control

Entries are sorted in increasing p-value order. (Thus, if the QQ field is present, its values just increase linearly.)

.allele.no.snp (allele mismatch report)

Produced by --update-alleles when there are too many mismatches between the loaded alleles for a variant and the old-allele column(s) of the --update-alleles input file..

A text file with no header line, and one line per mismatching variant with the following three fields:

Variant identifier
Expected allele #1 (from --update-alleles input file)
Remaining expected alleles, comma-separated; or "." if none

.bcf (binary Variant Call Format)

Variant information + sample ID + genotype call binary file. Imported with --bcf, and produced by "--export bcf".

Refer to the hts-specs GitHub repository for a detailed description of the format. "--export bcf" uses binary encoding v2.2.

.bed (PLINK 1 binary biallelic genotype table)

PLINK 1's preferred way to represent genotype calls. Must be accompanied by .bim and .fam files. Loaded with --bfile, and generated by --make-bed.

Do not confuse this with the UCSC Genome Browser's BED format, which is totally different. (It is safe to change a PLINK 1 .bed file's extension to .pgen and use --bpfile to load it.)

See the PLINK 1.9 documentation for a detailed description of the usual variant-major form, along with an example. PLINK 2 can also efficiently export the sample-major form ("--export ind-major-bed"); it has third byte equal to zero instead of one, but is otherwise analogous.

.bgen (Oxford variant info + genomic data binary file)

Native binary file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. BGEN v1.1 files should always be accompanied by a .sample file. Loaded with --bgen, and produced by "--export bgen-1.{1,2,3}".

Refer to https://www.chg.ox.ac.uk/~gav/bgen_format/ for a detailed description of the format.

.bim (PLINK extended MAP file)

Variant information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-bim can be used to update just this file.)

A text file with no header line, and one line per variant with the following six fields:

Chromosome code
Variant ID
Position in centimorgans (safe to use dummy value of '0')
Base-pair coordinate (1-based; limited to 2³¹-2)
ALT ('A1' in PLINK 1.x) allele code
REF ('A2' in PLINK 1.x) allele code

A few notes:

Yes, the ALT column comes before the REF column in a .bim file.
When .bed files are involved, the ALT and REF allele codes will sometimes be swapped, since that's PLINK 1.x's default behavior whenever the true REF allele is less common than the ALT allele in the current dataset. If that's a problem, you can use --ref-allele to swap them back.
It is safe to change a .bim file's extension to .pvar and use --pfile to load it.
Variants with negative bp coordinates are ignored by PLINK.
PLINK 1.9 and 2.0 permit the centimorgan column to be omitted. (However, omission is not recommended if the .bim file needs to be read by other software.)

.bins (allele count or frequency histogram)

A text file with a header line, followed by one line per [start, end) histogram bin with the following two fields:

Header	Contents
BIN_START	Start of bin
OBS_CT	Number of variants in the bin

The end of the current bin interval is the next line's BIN_START value (or positive infinity if there is no next line).

.clumps (reprocessed LD-clumped reports)

Produced by --clump.

A text file with a header line, and one line per index variant (lowest p-values first) with the following fields:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
A1	maybea1, a1	Tested allele
F	maybef, f	1-based file number
[NEG_LOG10_]P	(required)	Index variant p-value (or -log10(p))
TOTAL	total	Number of other variants in clump
*CLUMP_FIRST_POS*	maybebounds, bounds	POS of first member with p < --clump-p2 threshold
*CLUMP_LAST_POS*	maybebounds, bounds	POS of last member with p < --clump-p2 threshold
NONSIG	bins	Number of clumped variants with p ≥ [highest p-value boundary]
S<bin boundary>, ...	bins	Number of clumped variants with [lower boundary] ≤ p < [this boundary]
SP2	sp2	Comma-delimited IDs, and possibly A1 allele and/or file number, of members with p < --clump-p2 threshold.
RANGES	(with --clump-range[0])	Comma-separated list of overlapping ranges

S<bin boundary> columns are in decreasing-p-value order, and the bin-boundary component of the column names no longer omit the leading "0.".

.cov (covariate table)

Produced by --write-covar, --make-[b]pgen/--make-bed, and --export when covariates have been loaded/specified. Valid input for --covar.

A text file with a header line, and one line per sample with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
PAT	maybeparents, parents	Paternal individual ID
MAT	maybeparents, parents	Maternal individual ID
SEX	sex	Sex (1 = male, 2 = female, 'NA' = unknown)
PHENO1	pheno1	All-missing phenotype column, if none loaded
<Pheno name>, ...	pheno1, phenos	Phenotype value(s) (only first if just 'pheno1')
<Covar name>, ...	(required)	Covariate values

(Note that --covar can also be used with files lacking a header row.)

.eigenvec, .eigenvec.allele, .eigenvec.var (principal components)

Produced by --pca. Accompanied by an .eigenval file, which contains one eigenvalue per line.

The .eigenvec file is a text file with a header line and between 1+V and 3+V columns per sample, where V is the number of requested principal components. The first columns contain the sample ID, and the rest are principal component scores in the same order as the .eigenval values (with column headers 'PC1', 'PC2', ...).

With the 'allele-wts' modifier, an .eigenvec.allele file is also generated. It's a text file with a header line, followed by one line per allele with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
A1	(required)	Current allele
AX	ax	Other alleles, comma-separated
PC1, PC2, ...	(required)	Principal component allele scores

Alternatively, with the 'biallelic-var-wts' modifier, an old-style .eigenvec.var file is generated. It's a text file with a header line, followed by one line per variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
MAJ	maj	Major allele
NONMAJ	nonmaj	All nonmajor alleles, comma separated
PC1, PC2, ...	(required)	Principal component variant weights; signs are w.r.t. the major allele

.fam (PLINK sample information file)

Sample information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-fam can be used to update just this file.)

A text file with no header line, and one line per sample with the following six fields:

Family ID ('FID')
Individual ID ('IID'; cannot be '0')
Individual ID of father ('0' if father isn't in dataset)
Individual ID of mother ('0' if mother isn't in dataset)
Sex code ('1' = male, '2' = female, '0' = unknown)
Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)

.fst.summary (all-population-pairs Wright's F_ST report)

Produced by --fst.

A text file with a header line, and then one line per population-pair with the following columns:

Header	Column set	Contents
POP1	(required)	First population ID
POP2	(required)	Second population ID
OBS_CT	nobs	Number of variants with valid F_ST estimates
'HUDSON_FST'/'WC_FST'	(required)	Between-population F_ST estimate
SE	(required)	Standard error of F_ST estimate, if blocksize= specified

.fst.var (per-variant Wright's F_ST report for one population pair)

Produced by --fst when 'report-variants' is specified. A separate file is generated for each population pair.

A text file with a header line, and then one line per autosomal variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
OBS_CT	nobs	Number of (nonmissing) genotype observations across population pair
POP1_ALLELE_CT	nallele	Number of nonmissing allele observations in first population
POP2_ALLELE_CT	nallele	Number of nonmissing allele observations in second population
FST_NUMER	fstfrac	Numerator of F_ST estimate
FST_DENOM	fstfrac	Denominator of F_ST estimate
'HUDSON_FST'/'WC_FST'	fst	Wright's F_ST estimate

.gcount (genotype count report)

Produced by --geno-counts.

A text file with a header line, and then one line per variant with the following columns:

Header	Col. set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
HOM_REF_CT	homref	Homozygous-ref count
HET_REF_ALT1_CT	refalt1	Heterozygous ref-alt1 count
HET_REF_ALT_CTS	refalt	Comma-separated het ref-altx counts
HOM_ALT1_CT	homalt1	Homozygous-alt1 count
TWO_ALT_GENO_CTS	altxy	Comma-separated altx-alty counts, in (1/1)-(1/2)-(2/2)-(1/3)-... order
DIPLOID_GENO_CTS	xy	Similar to altxy, except reference allele included
HAP_REF_CT	hapref	Haploid-ref count
HAP_ALT1_CT	hapalt1	Haploid-alt1 count
HAP_ALT_CTS	hapalt	Comma-separated haploid-altx counts
HAP_CTS	hap	Similar to hapalt, except ref also included
GENO_NUM_CTS	numeq	"0/0=<hom ref ct>,0/1=<het ref-alt1>,...,0=<hap ref>" etc.; zero-counts are omitted; '.' if all genotypes missing
MISSING_CT	missing	Number of missing genotypes
OBS_CT	nobs	Number of (nonmissing) genotype observations

.gen (Oxford text genotype file format)

Native text genotype file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. Should always be accompanied by a .sample file. Imported with --data/--gen, and produced by "--export oxford[-v2]".

A text file with no header line, and one line per variant with either 3N+5 or 3N+6 fields where N is the number of samples. Each line stores information for a single SNP.

In the 3N+5 case (corresponding to the original specification), the first five fields are:

"SNP ID"
rsID (treated by PLINK as the main variant ID)
Base-pair coordinate
Allele 1 (usually minor, use 'ref-first' when importing to treat as REF)
Allele 2 (usually major, use 'ref-last' when importing to treat as REF)

Unless the chromosome code was declared with --oxford-single-chr (in which case the SNP ID column is ignored), PLINK has no choice but to assume that the "SNP ID" column actually stores chromosome codes. (This is the convention when PLINK exports a 5-leading-column .gen file.)

The newer 3N+6 column flavor has a dedicated chromosome column in front. This was not supported by PLINK 1.9 or 2.0 before 16 Apr 2021.

Each subsequent triplet of values then indicate likelihoods of homozygote A1, heterozygote, and homozygote A2 genotypes at this variant, respectively, for one sample. If they add up to less than one, the remainder is a no-call probability weight.

The PLINK 2 binary format can represent allele count expected values, but it does not distinguish between e.g. {P(hom-ref)=0.28, P(het)=0.52, P(hom-alt)=0.2} and {P(hom-ref)=0.08, P(het)=0.92, P(hom-alt)=0}, and it ignores the no-call probability weight (though "0 0 0" will be correctly converted to a missing call). The --import-dosage-certainty flag can be used during import to replace some of the most uncertain genotype calls with missing values.

.geno (EIGENSOFT PACKEDANCESTRYMAP or TGENO binary genotype format)

Native binary genotype file format for EIGENSOFT and ADMIXTOOLS. Should always be accompanied by .ind and .snp files. Imported with --eigfile/--eiggeno. The original variant-major PACKEDANCESTRYMAP form is produced by "--export eig", while the sample-major TGENO form is produced by "--export eigt"

A PACKEDANCESTRYMAP file has V+1 blocks of max(48, ⌈N/4⌉) bytes each, where V is the number of variants and N is the number of samples. The first block is a header, starting with a space/tab-delimited string with the following entries:

"GENO"
Number of samples
Number of variants
hasharr(sample_ids, N) in hex
hasharr(variant_ids, V) in hex

and followed by null bytes. In C, the hasharr() function can be defined as follows:

uint32_t hasharr(const char* const* ids, uint32_t n) {
uint32_t hash = 0;
for (uint32_t i = 0; i < n; ++i) {
hash = (hash * 17) ^ hashone(ids[i]);
}
return hash;
}

where hashone() is:

uint32_t hashone(const char* id) {
uint32_t hash = 0;
for (uint32_t i = 0; ; ++i) {
unsigned char cur_char = id[i];
if (cur_char == 0) {
return hash;
}
hash = (hash * 23) + cur_char;
}
}

Each remaining block corresponds to one marker in the .snp file. The high-order two bits of a block's first byte store the first sample's genotype code, using the following encoding:

00	Zero copies of REF allele (i.e. homozygous-ALT)
01	One copy of REF allele
10	Two copies of REF allele
11	Missing genotype

Next three samples are stored in lower-order bits of the first byte, etc. Trailing bits in the block are always zero.

A TGENO file has a 48-byte header, starting with a space/tab-delimited string; the first field in that string is "TGENO", and the other four fields match those in the PACKEDANCESTRYMAP header. This is followed by N blocks of max(48, ⌈V/4⌉) bytes each, each corresponding to one sample in the .ind file. Encoding is otherwise the same as with PACKEDANCESTRYMAP.

.glm.firth, .glm.logistic[.hybrid] (logistic/Firth regression association statistics)

Produced by --glm with a case/control phenotype.

A text file with a header line, and then one line per variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
PROVISIONAL_REF?	maybeprovref, provref	Reports whether REF allele is provisional
A1	(required)	Counted allele¹ in regression
OMITTED	omitted	Omitted allele
A1_CT²	a1count	Total A1 allele count (can be decimal with dosage data)
ALLELE_CT²	totallele	Allele observation count
A1_CASE_CT²	a1countcc	A1 count in cases
A1_CTRL_CT²	a1countcc	A1 count in controls
CASE_ALLELE_CT²	totallelecc	Case allele observation count
CTRL_ALLELE_CT²	totallelecc	Control allele observation count
CASE_NON_A1_CT	gcountcc	Case genotypes with 0 copies of A1
CASE_HET_A1_CT	gcountcc	Case genotypes with 1 copy of A1
CASE_HOM_A1_CT	gcountcc	Case genotypes with 2 copies of A1
CTRL_NON_A1_CT	gcountcc	Control genotypes with 0 copies of A1
CTRL_HET_A1_CT	gcountcc	Control genotypes with 1 copy of A1
CTRL_HOM_A1_CT	gcountcc	Control genotypes with 2 copies of A1
A1_FREQ	a1freq	A1 allele frequency
A1_CASE_FREQ	a1freqcc	A1 allele frequency in cases
A1_CTRL_FREQ	a1freqcc	A1 allele frequency in controls
MACH_R2	machr2	MaCH imputation quality metric
*FIRTH?*	firth	Reports whether Firth reg. was used ('firth-fallback' only)
TEST	test	Test identifier
OBS_CT	nobs	Number of samples in regression
BETA	beta	Regression coefficient (for A1 allele)
OR	orbeta	Odds ratio (for A1 allele)
[LOG(OR)_]SE	se	Standard error of log-odds (i.e. beta)
*L##*	ci	Bottom of symmetric approx. confidence interval (with --ci)
*U##*	ci	Top of symmetric approx. confidence interval (with --ci)
Z_[OR_F_]STAT	tz	F-statistic for joint test, Wald Z-score for logistic/Firth regression
[NEG_LOG10_]P	p	Asymptotic p-value (or -log10(p)) for Z/chisq-stat
ERRCODE	err	When result is 'NA', an error code describing the reason

All statistics are computed across just the samples used in the regression.

.glm.linear (linear regression association statistics)

Produced by --glm with a quantitative phenotype.

A text file with a header line, and then one line per variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
PROVISIONAL_REF?	maybeprovref, provref	Reports whether REF allele is provisional
A1	(required)	Counted allele¹ in regression
OMITTED	omitted	Omitted allele
A1_CT²	a1count	Total A1 allele count (can be decimal with dosage data)
ALLELE_CT²	totallele	Allele observation count
A1_FREQ	a1freq	A1 allele frequency
MACH_R2	machr2	MaCH imputation quality metric
TEST	test	Test identifier
OBS_CT	nobs	Number of samples in regression
BETA	beta, orbeta	Regression coefficient (for A1 allele)
SE	se	Standard error of log-odds (i.e. beta)
*L##*	ci	Bottom of symmetric approx. confidence interval (with --ci)
*U##*	ci	Top of symmetric approx. confidence interval (with --ci)
T_[OR_F_]STAT	tz	F-statistic for joint test; t-statistic for linear regression
[NEG_LOG10_]P	p	Asymptotic p-value (or -log10(p)) for T/chisq-stat
ERRCODE	err	When result is 'NA', an error code describing the reason

All statistics are computed across just the samples used in the regression.

1: For multiallelic variants, this column may contain multiple comma-separated alleles when the result doesn't depend on which allele is A1.
2: For males on chrX, these values are normally computed as if males were diploid, since that's the encoding used in the regression. The exception is when "--xchr-model 1" is specified, where male 0..1 values coexist with female 0..2 values in the regression. In that case, these columns will also be based on the mixed male 0..1, female 0..2 scaling.
To be clear, --glm only uses this 0..2 haploid coding on chrX, to put males and females on an equal footing in a world where X-inactivation is common. chrY/chrM use 0..1 coding.

.grm (GCTA text relationship matrix)

Produced by --make-grm-list.

A text file with no header line, and one line per pair of samples (not necessarily distinct) with the following four fields:

1-based index of first sample in .grm.id file
1-based index of second sample in .grm.id file
Number of observations (variants where neither sample has a missing call)
Relationship value

.grm.N.bin, .grm.bin (GCTA 1.1+ triangular binary relationship matrix)

Produced by --make-grm-bin.

These files contain single-precision (4-byte) floating point values. Using 1-based matrix indices, the first value in each file is the (1, 1) relationship value (.grm.bin) or observation count (.grm.N.bin); the second and third values are the (2, 1) and (2, 2) relationships/counts; the fourth through sixth values are the (3, 1), (3, 2) and (3, 3) relationships/counts in that order; and so on.

Note that .grm.bin files generated by GCTA versions before 1.1 have a different format.

.haps (Oxford phased haplotype file)

Reference panel haplotype file format for IMPUTE2. Must be accompanied by a .legend file when no variant info header columns are present. Imported with --haps, and produced by "--export haps[legend]".

A text file with no header line, and either 2N+5 or 2N fields where N is the number of samples. In the former case, the first five columns are:

Chromosome code
Variant ID
Base-pair coordinate
Allele 0 (usually minor, use 'ref-first' when importing to treat as REF)
Allele 1 (usually major, use 'ref-last' when importing to treat as REF)

This is followed by a pair of 0/1-valued haplotype columns for the first sample, then a pair of haplotype columns for the second sample, etc. (For male samples on chrX, the second column may contain dummy '-' entries; otherwise, missing genotype calls are not permitted.)

.hardy (Hardy-Weinberg equilibrium exact test report)

Produced by --hardy when autosomal diploid variants are present.

A text file with a header line, and one line per autosomal diploid variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
A1	(required)	Tested allele
AX	ax	Non-A1 alleles, comma-separated
HOM_A1_CT	gcounts	Homozygous-A1 genotype count
HET_A1_CT	gcounts	Heterozygous-A1 genotype count
TWO_AX_CT	gcounts	# of nonmissing calls with no A1 copies
GCOUNTS	gcount1col	gcounts values in a single comma-separated column
O(HET_A1)	hetfreq	Observed heterozygous-major frequency
E(HET_A1)	hetfreq	Expected heterozygous-major frequency
[NEG_LOG10_][MID]P	p	Hardy-Weinberg equilibrium exact test [mid-]p-value (or -log10(p))

.hardy.x (Graffelman-Weir extended chrX HWE test report)

Produced by --hardy when chrX variants are present.

A text file with a header line, and one line per chrX variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
A1	(required)	Tested allele
AX	ax	Non-A1 alleles, comma-separated
FEMALE_HOM_A1_CT	gcounts	Female homozygous-A1 genotype count
FEMALE_HET_A1_CT	gcounts	Female heterozygous-A1 genotype count
FEMALE_TWO_AX_CT	gcounts	# of nonmissing female calls with no A1 copies
MALE_A1_CT	gcounts	Male A1 allele count
MALE_AX_CT	gcounts	Male non-A1 allele count
GCOUNTS	gcount1col	gcounts values in a single comma-separated column
O(FEMALE_HET_A1)	hetfreq	Observed het-A1 frequency
E(FEMALE_HET_A1)	hetfreq	Expected het-A1 frequency
FEMALE_A1_FREQ	sexaf	Female A1 allele frequency
MALE_A1_FREQ	sexaf	Male A1 allele frequency
FEMALE_ONLY_[NEG_LOG10_][MID]P	femalep	Old female-only HWE exact test [mid-]p-value (or -log10(p))
[NEG_LOG10_][MID]P	p	Graffelman-Weir HWE test [mid-]p-value (or -log10(p))

.het (method-of-moments F coefficient estimates)

Produced by --het.

A text file with a header line, and one line per sample with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
O(HOM)	hom	Observed number of homozygous genotypes
E(HOM)	hom	Expected number of homozygous genotypes
O(HET)	het	Observed number of heterozygous genotypes
E(HET)	het	Expected number of heterozygous genotypes
OBS_CT	nobs	Number of (nonmissing, non-monomorphic) autosomal genotype observations
F	f	Method-of-moments F coefficient estimate

.id (Sample ID list)

When generated by PLINK 2, this is a text file which may or may not have a header line. If there's no header line (default with .grm.id files, can be forced for other .id files with --no-id-header), and there's a single column, they are IIDs; if there are two columns, they are FID/IID. Otherwise, there's one line per sample after the header line with the following columns:

Header	Contents
FID	Family ID (present iff .psam or --update-ids file has it)
IID	Individual ID (always present)
SID	Source ID (present iff .psam or --update-ids file has it)

.ind (EIGENSOFT sample information file)

Sample information file accompanying an EIGENSOFT .geno binary genotype table. Loaded with --eigfile/--eigind, and produced by --export eig[t].

A text file with no header line, and one line per sample with the following three fields:

Sample ID (max 39 characters)
Sex code ('M' = male, 'F' = female, 'U' = unknown)
Label (binary phenotype represented as 'Case'/'Control')

.kin0 (KING-robust kinship coefficient report)

Produced by --make-king-table.

A text file with a header line, and one line per sample pair with kinship coefficient no smaller than the --king-table-filter value. When --king-table-filter is not specified, all sample pairs are included. The following columns are present:

Header	Column set	Contents
*FID1*	maybefid, fid	FID of first sample in current pair
ID1	id	IID of first sample in current pair
*SID1*	maybesid, sid	SID of first sample in current pair
*FID2*	maybefid, fid	FID of second sample in current pair
ID2	id	IID of second sample in current pair
*SID2*	maybesid, sid	SID of second sample in current pair
NSNP	nsnp	Number of variants considered (autosomal, neither call missing)
HETHET	hethet	Proportion/count of considered call pairs which are het-het
IBS0	ibs0	Proportion/count of considered call pairs which are opposite homs
HET1_HOM2	ibs1	Proportion/count of sample 1 het, sample 2 hom
HET2_HOM1	ibs1	Proportion/count of sample 1 hom, sample 2 het
KINSHIP	kinship	KING-robust between-family kinship estimate

.king[.bin] (KING-robust kinship coefficient matrix)

Produced by --make-king. Accompanied by a .king[.bin].id file containing sample IDs.

If text, a tab-delimited file that is either lower-triangular (excluding the diagonal) or square. If it's square, the upper-right triangle may be either zeroed out or the mirror-image of the lower-left triangle, depending on whether the 'square0' or 'square' modifier was used.

The binary format is semantically identical; it just has nothing but single- (4-byte) or double-precision (8-byte) floating point values, instead of text+delimiters+linebreaks.

.legend (Oxford single-chromosome variant information file)

Single-chromosome variant information file accompanying a bare .haps reference panel haplotype file. Imported with --legend, and produced by "--export hapslegend".

A text file with a header line, and one line per variant with the following four columns:

Header	Contents
id	Variant ID
position	Base-pair coordinate
a0	Allele 0 (usually minor, use 'ref-first' to treat as REF)
a1	Allele 1 (usually major, use 'ref-last' to treat as REF)

.map (PLINK 1 text fileset variant information file)

Variant information file accompanying a .ped text pedigree + genotype table.

A text file with no expected header line, and one line per variant with the following 3-4 fields:

Chromosome code. PLINK 1.9 and 2.0 also permit contig names here, but most older programs do not.
Variant ID
Position in centimorgans (optional; safe to use dummy value of '0')
Base-pair coordinate (1-based; limited to 2³¹-2)

All lines must have the same number of columns (so either no lines contain the centimorgans column, or all of them do).

Lines starting with '#' are supposed to be treated as comments, but this was not consistently supported by PLINK 1.9 and 2.0 before Aug 2024.

.mendel, .imendel, .fmendel, .lmendel (Mendel error reports)

Produced by --mendel.

The .mendel file is a text file with a header line, and one line per error with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
KID	(required)	Kid's IID
*KID_SID*	maybesid, sid	Kid's source-ID
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	id	Variant ID
CODE	code	Numeric error code
ERROR	error	Description of error (no longer contains spaces)

Note that '*/*' or '*' in the error description does not (necessarily) refer to a missing genotype call; instead, it means the Mendel error is unrelated to that parent.

The .lmendel file has a header line, and one line per variant with the following three columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	id	Variant ID
N	(required)	Number of Mendel errors

The .imendel file has a header line, and one subsection per nuclear family. Each subsection contains one line per family member with the following three columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
N	(required)	Number of errors implicating this sample (only considering nuclear family)

Samples may appear more than once in this file.

Finally, the .fmendel file has a header line, and one line per nuclear family with the following five columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
PAT	(required)	Paternal IID (0 if missing)
MAT	(required)	Maternal IID (0 if missing)
CHLD	(required)	Number of offspring samples in nuclear family
N	(required)	Number of Mendel errors in nuclear family

.pdiff (two-fileset genotype/dosage discordance report)

Produced by --pgen-diff.

A text file with a header line, and then one line per discordance with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	id	Variant ID
REF	ref	Reference allele
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
'GT1'/'DS1'	geno	Genotype/dosage of first sample
'GT2'/'DS2'	geno	Genotype/dosage of second sample

.ped (PLINK 1/MERLIN/Haploview sample-major text genotype table)

Pedigree information + genotype call text file. Must be accompanied by a .map file. Loaded with --pedmap, and produced by "--export ped". This format is simultaneously highly inefficient, even relative to other text formats, and limited in scope (unobserved minor allele codes can't be stored); continued use is strongly discouraged.

Contains no header line, and one line per sample with 2V+6 fields where V is the number of variants. The first six fields are the same as those in a .fam file. The seventh and eighth fields are allele calls for the first variant in the .map file ('0' = no call); the 9th and 10th are allele calls for the second variant; and so on. All variants must be biallelic (or monomorphic, or all-missing).

If all alleles are single-character, PLINK 1.9 and 2.0 will correctly parse the more compact "compound genotype" variant of this format, where each genotype call is represented as a single two-character string. This does not require the use of an additional loading flag. You can produce such a file with "--export compound-genotypes".

It is also possible to load .ped files missing some initial fields.

Lines starting with '#' are supposed to be treated as comments, but this was not supported by PLINK 1.9 and 2.0 before Aug 2024.

.pgen, .pgen.pgi (PLINK 2 binary genotype table)

PLINK 2's preferred way to represent genotype calls. Must be accompanied by .pvar/.bim and .psam/.fam files. Loaded with --pfile/--bpfile, and generated with --make-pgen/--make-bpgen and all import commands.

Most .pgen files have an embedded index, and do not have an accompanying .pgen.pgi file. When the index is not embedded, PLINK 2 expects it to be stored in "<.pgen filename>.pgi".

A draft specification of these formats is available. The first version will be finalized around the beginning of PLINK 2.0 beta testing.

.psam (PLINK 2 sample information file)

Sample information file accompanying a .pgen binary genotype table. (--make-just-psam can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#FID' or '#IID'. This final header line specifies the columns in the .psam file; the following intermediate column headers are recognized:

IID (individual ID; required)
SID (source ID, when there are multiple samples for the same individual)
PAT (individual ID of father, '0' if unknown)
MAT (individual ID of mother, '0' if unknown)
SEX ('1' = male, '2' = female, 'NA'/'0' = unknown)

(FID must either be the first column, or absent. If it's absent, all FID values are now assumed to be '0'.) Any other value is treated as a phenotype/covariate name; see the phenotype/covariate documentation for column encoding details.

If no header line is present, the columns are assumed to be in .fam file order (FID, IID, PAT, MAT, SEX, PHENO1).

.phy (relaxed PHYLIP format)

Multiple sequence alignment text file, produced by "--export phylip[-phased]", and recognized by FastTree, IQ-TREE, and several other phylogenetic tools. This format cannot be loaded by PLINK.

The header line contains two numbers, the number of sequences followed by the number of nucleotide codes per sequence.

Each subsequent line contains two fields. The first field contains the sample ID, and is padded by spaces to a fixed width, such that the longest sample ID is followed by exactly 3 spaces. (This imitates the behavior of vcf2phylip.) The second field contains IUPAC nucleotide codes.

.pvar (PLINK 2 variant information file)

Variant information file accompanying a .pgen binary genotype table. (--make-just-pvar can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#CHROM'. This final header line specifies the columns in the .pvar file; the following intermediate column headers are recognized:

POS (base-pair coordinate)
ID (variant ID; required)
REF (reference allele)
ALT (alternate alleles, comma-separated; some commands expect uniqueness)
QUAL (phred-scaled quality score for whether the locus is variable at all)
FILTER ('PASS', '.', or semicolon-separated list of failing filter codes)
INFO (semicolon-separated list of flags and key-value pairs, with types declared in header)
FORMAT (terminates header line parsing)
CM (centimorgan position)

In particular, a VCF file, or a trimmed VCF file with all columns past the 5th (or 6th, etc.) removed, is valid input for anything expecting a .pvar-format file.

The following VCF-style header lines are also recognized:

"##INFO=<ID=PR,Number=0,Type=Flag...": Indicates the INFO/PR flag, which marks 'provisional' reference alleles (i.e. imported from a file which does not consistently track which allele is reference and which are alternates), is present. (This information is also present in .pgen files, and the loader reports an error when the .pvar and .pgen flags don't match.)
"##chrSet=...": Explicitly specifies the chromosome set. E.g. --make-pgen + --dog will cause "##chrSet=<ID=1,autosomePairCt=38,X,Y,XY,M>" to be written to the .pvar header, and as a consequence it isn't necessary to include the --dog flag when loading the new fileset.

When no header line is present, the columns are assumed to be in .bim file order (CHROM, ID, CM, POS, ALT, REF; or if only 5 columns are present, CM is assumed to be omitted).

.raw (additive + dominant component file)

Produced by "--export {A,AD}"; suitable for loading from R. This format cannot be loaded by PLINK.

A text file with a header line, and then one line per sample with V+6 (for "--export A") or 2V+6 (for "--export AD") fields, where V is the number of variants. The header line does not contain a preceding '#'. The first six fields are:

FID	Family ID
IID	Individual ID
PAT	Paternal individual ID
MAT	Maternal individual ID
SEX	Sex (1 = male, 2 = female, 0 = unknown)
PHENOTYPE	First active non-categorical phenotype (missing value if none)

This is followed by one or two fields per variant:

<Variant ID>_<counted allele>	Allelic dosage (missing = 'NA', haploid scaled to 0..2)
<Variant ID>_HET	Dominant component (1 = het). Requires "--export AD".

If 'include-alt' was specified, the header line also names alternate allele codes in parentheses, e.g. 'rs5939319_G(/A)'.

.rel[.bin] (relationship matrix)

Produced by --make-rel. Accompanied by a .rel[.bin].id file containing sample IDs.

Contents are identical to that of a .grm/.grm.bin file. Possible shapes are essentially the same as for .king files; the only difference is that .king files have an omitted or constant-0.5 diagonal while .rel files do not.

.sample (Oxford sample information file)

Sample information file accompanying a .gen or .bgen genotype dosage file, or a .haps phased reference panel. Loaded with --data/--sample, and produced by --export in several cases.

By default, the .sample space-delimited files emitted by --export have two header lines, and then one line per sample with 4+ fields:

First header line	Second header line	Subsequent contents
ID_1	0	Family ID
ID_2	0	Individual ID
missing	0	Missing call frequency
sex	D	Sex code ('1' = male, '2' = female, '0' = unknown)
<Pheno name>, ...	'B'/'D'/'P'	Binary ('0' = control, '1' = case), discrete (categorical, positive integers), or continuous phenotype; missing values represented by 'NA'

(As of 6 Apr 2021, PLINK 2 accepts 'C' as a synonym for column type 'P' in .sample input files.)

With --export's 'sample-v2' modifier, this is adjusted to:

First header line	Second header line	Subsequent contents
ID	0	Sample ID
missing	0	(unchanged)
father	D	Paternal individual ID
mother	D	Maternal individual ID
sex	D	Unknown sex encoded as 'NA' instead of '0'
<Pheno name>, ...	'B'/'D'/'P'	For type 'D', original category names are saved instead of just integers; otherwise unchanged

Note that older programs are likely to support only the first .sample dialect.

A specification for this format is on the QCTOOL v2 website.

.scount (sample variant-count report)

Produced by --sample-counts.

A text file with a header line, and then one line per discordance with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
SEX	sex	Sex (1 = male, 2 = female, 'NA' = unknown)
HOM_CT	hom	Homozygous genotype count
HOM_REF_CT	homref	Hom-REF genotype count
HOM_ALT_CT	homalt	Hom-ALT genotype count
HOM_ALT_SNP_CT	homaltsnp	Hom-ALT SNP (single-character REF and ALT) count
HET_CT	het	Heterozygous genotype count
HET_REF_ALT_CT	refalt	Het. REF-ALTx genotype count
HET_2ALT_CT	het2alt	Het. ALTx-ALTy genotype count
HET_SNP_CT	hetsnp	Het. SNP genotype count
DIPLOID_TRANSITION_CT	dipts	Diploid SNP transition (A↔G, C↔T) count
TRANSITION_CT	ts	SNP transition count
DIPLOID_TRANSVERSION_CT	diptv	Diploid SNP transversion count
TRANSVERSION_CT	tv	SNP transversion count
DIPLOID_NONSNP_NONSYMBOLIC_CT	dipnonsnpsymb	Diploid non-SNP, non-symbolic variant count
NONSNP_NONSYMBOLIC_CT	nonsnpsymb	Non-SNP, non-symbolic variant count
SYMBOLIC_CT	symbolic	Symbolic (starting with '<') variant count
NONSNP_CT	nonsnp	Non-SNP variant count
DIPLOID_SINGLETON_CT	dipsingle	Number of singletons relative to this dataset, considering just diploid calls³
SINGLETON_CT	single	Number of singletons relative to this dataset
HAP_REF_INCL_FEMALE_Y_CT	haprefwfemaley	Haploid REF count, counting chrY for everyone
HAP_REF_CT	hapref	Haploid REF count, excluding chrY for nonmales
HAP_ALT_INCL_FEMALE_Y_CT	hapaltwfemaley	Haploid ALT count, counting chrY for everyone
HAP_ALT_CT	hapalt	Haploid ALT count, excluding chrY for nonmales
MISSING_INCL_FEMALE_Y_CT	missingwfemaley	Missing call count, counting chrY for everyone
MISSING_CT	missing	Missing call count, excluding chrY for nonmales

The 'hetsnp', 'dipts'/'ts'/'diptv'/'tv', 'dipnonsnpsymb'/'nonsnpsymb', 'symbolic', and 'nonsnp' columns count each ALT allele in a heterozygous ALTx-ALTy genotype separately, since they can be of different subtypes. (I.e. if they are of the same subtype, the corresponding count is incremented by 2.) As a consequence, these columns are unaffected by variant split/join.

3: If the ALT allele in a chrX biallelic variant appears in exactly one female and one male, that counts as a singleton in this column for just the female.

.sdiff (sample-pair discordance report)

Produced by --sample-diff.

A text file with a header line, and then one line per discordance with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
*FID1*	*maybefid*, fid	FID of first sample in current pair
*IID1*	id	IID of first sample in current pair
*SID1*	*maybesid*, sid	SID of first sample in current pair
*FID2*	*maybefid*, fid	FID of second sample in current pair
*IID2*	id	IID of second sample in current pair
*SID2*	*maybesid*, sid	SID of second sample in current pair
'GT1'/'DS1'	geno	Genotype/dosage of first sample
'GT2'/'DS2'	geno	Genotype/dosage of second sample

.sdiff.summary (sample-pair discordance count summary)

Produced by --sample-diff.

A text file with a header line, and then one line per sample pair with the following columns:

*FID1*	maybefid, fid	FID of first sample in current pair
IID1	(required)	IID of first sample in current pair
*SID1*	maybesid, sid	SID of first sample in current pair
*FID2*	maybefid, fid	FID of second sample in current pair
IID2	(required)	IID of second sample in current pair
*SID2*	maybesid, sid	SID of second sample in current pair
OBS_CT	nobs	Number of genotype/dosage pairs considered
IBS_OBS_CT	nobsibs	Number of diploid hardcall-pairs
IBS0_CT	ibs0	# of diploid hardcall-pairs with no matching alleles
IBS1_CT	ibs1	# of diploid hardcall-pairs with exactly 1 matching allele
IBS2_CT	ibs2	# of diploid hardcall-pairs with 2 matching alleles
HALFMISS_CT	halfmiss	# of genotype/dosage pairs with exactly 1 missing call
DIFF_CT	diff	# of genotype/dosage discordances

.sexcheck (sex imputation report)

Produced by --check-sex/--impute-sex.

A text file with a header line, and one line per sample with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
PEDSEX	pedsex	Sex code in input file (1 = male, 2 = female, NA = unknown)
SNPSEX	(required)	Imputed sex code (1/2/NA)
STATUS	status	'OK' on nonmissing PEDSEX and SNPSEX match, 'PROBLEM' otherwise
F	xf	If chrX used, inbreeding coefficient estimated off chrX
YCOUNT	ycount	If chrY used, number of valid chrY genotypes
*YRATE*	yrate	If chrY used, chrY valid genotype rate
YOBS	yobs	If chrY used, number of chrY variants considered

.smiss (sample-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
MISS_PHENO1	misspheno1	First active phenotype missing (Y/N), Y if none
<Pheno name>, ...	missphenos	Y/N column for each loaded phenotype
MISSING_DOSAGE_CT	nmissdosage	Number of missing dosages
MISSING_CT	nmiss	Number of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CT	nmisshh	Number of missing hardcalls, counting het haploids
HETHAP_CT	hethap	Number of heterozygous haploid hardcalls
OBS_CT	nobs	Denominator (# samples, females excluded on chrY)
F_MISS_DOSAGE	fmissdosage	Missing dosage rate
F_MISS	fmiss	Missing hardcall rate, not counting het haploids
F_MISS_AND_HETHAP	fmisshh	Missing hardcall rate, counting het haploids

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).

.snp (EIGENSOFT variant information file)

Variant information file accompanying an EIGENSOFT .geno binary genotype table. Loaded with --eigfile/--eigsnp, and produced by --export eig[t].

A text file with no header line, and one line per variant with the following three fields:

Variant ID
Numeric chromosome code (90 = MT, 91 = XY)
Position in centimorgans (safe to use dummy value of '0')
Base-pair coordinate (1-based; limited to 2³¹-2)
REF single-character allele code
ALT single-character allele code ('X' = missing)

.sscore (sample scores)

Produced by --score and --score-list.

A text file with a header line, and then one line per sample with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
PHENO1	pheno1	All-missing phenotype column, if none loaded
<Pheno name>, ...	pheno1, phenos	Phenotype value(s) (only first if just 'pheno1')
*ALLELE_CT*	nallele	Number of alleles across scored variants (--score only)
DENOM	denom	Denominator used for score average (--score only)
*NAMED_ALLELE_DOSAGE_SUM*	dosagesum	Sum of named allele dosages (--score only)
<Score name>_AVG, ...	scoreavgs	Score averages
<Score name>_SUM, ...	scoresums	Score sums

.ssf.tsv (association statistics in GWAS-SSF format)

Produced by --gwas-ssf postprocessing --glm output.

A text file with a header line, and then one line per variant with the following columns:

Header	Contents
chromosome	Chromosome code (1-25, where X=23, Y=24, MT=25)
base_pair_location	Base-pair coordinate
effect_allele	Counted allele in regression
other_allele	Omitted allele
'beta'/'odds_ratio'	Regression coefficient or odds ratio for effect_allele
standard_error	Standard error of beta
effect_allele_frequency	Frequency of effect_allele in regression
[neg_log_10_]p_value	Asymptotic p-value or -log10(p)
variant_id	<chrom>_<pos>_<ref>_<alt> variant ID
rsid	rsID
ci_upper	Upper end of beta/odds_ratio confidence interval
ci_lower	Lower end of beta/odds_ratio confidence interval
n	Number of samples in regression
ref_allele	Indicates which allele is REF ('EA', 'OA', or '#NA')

(Since the --gwas-ssf command does not have a cols= modifier, boldface is used to denote mandatory GWAS-SSF fields in this table.)

.svd.pheno (summary phenotypes generated via SVD)

Produced by --pheno-svd.

A text file with a header line, and then one line per sample with the following columns:

Header	Column set	Contents
*FID*	maybefid, fid	Family ID
IID	(required)	Individual ID
*SID*	maybesid, sid	Source ID
SVDPHENO1, ...	(required)	New phenotype values

.svd.pheno_wts (singular values and right-singular vectors from phenotype SVD)

Produced by --pheno-svd.

A text file with a header line, and then one line per new phenotype with the following columns:

Header	Column set	Contents
NEW_PHENO_ID	id	New phenotype ID
SINGULAR_VALUE	sv	Singular value from SVD
<Old pheno name>, ...	(required)	Right-singular vectors from SVD

.tfam (PLINK 1 sample information file)

Sample information file accompanying a .tped file; identical format to .fam files.

.tped (PLINK 1 variant-major text genotype table)

Variant information + genotype call text file. Must be accompanied by a .tfam file. Loaded with --tfile, and produced by "--export tped".

Contains no header line, and one line per variant with 2N+4 fields where N is the number of samples. The first four fields are the same as those in a .map file. The fifth and sixth fields are allele calls for the first sample in the .tfam file ('0' = no call); the 7th and 8th are allele calls for the second sample; and so on. All variants must be biallelic (or monomorphic, or all-missing).

.traw (variant-major additive component file)

Produced by "--export Av"; suitable for loading from R. Loaded with --import-dosage (note that several modifiers must be specified).

A text file with a header line without a leading '#', and then one line per variant with the following N+6 fields (where N is the number of samples):

CHR	Chromosome code
SNP	Variant identifier
(C)M	Position in centimorgans
POS	Base-pair coordinate
COUNTED	Counted allele (now defaults to REF)
ALT	Other allele(s), comma-separated
<FID>_<IID>...	Allelic dosages (missing = 'NA', haploid scaled to 0..2)

.used_sites.tsv (variant information for relaxed-PHYLIP file)

Produced by "--export phylip[-phased] used-sites". Accompanied by a .phy file.

A text file with a header line, and then one line per variant with the following 3 fields:

CHROM	Chromosome code
POS	Base-pair coordinate
NUM_SAMPLES	Number of samples with nonmissing nucleotides

.vcf, .bcf (1000 Genomes Project Variant Call Format)

Variant information + sample ID + genotype call file; text if .vcf, binary if .bcf. Imported with --vcf/--bcf, and produced by "--export {b,v}cf".

Note that, while PLINK 2.0 supports a much larger subset of the VCF standard than PLINK 1.9, it still isn't appropriate for general-purpose VCF handling. Instead, the goal is to provide a very useful complement to bcftools. For example, PLINK 2.0 does not save per-call read depths, so any data management or analysis which requires them to be kept around should be done with bcftools or a similarly general tool; but once you're done with variant calling/imputation and are ready to treat your data as a single matrix of hardcalls or dosages (possibly with missing entries), PLINK 2.0 is much more efficient.

The VCFv4.3 files emitted by "--export vcf" start with the following three header lines:

##fileformat=VCFv4.3
##fileDate=<yyyymmdd date>
##source=PLINKv2.00

This is usually followed by all the VCF header lines (if any) present in the loaded .pvar file, a "##chrSet=" chromosome set description when appropriate, and additional "##contig=", INFO/PR, and FORMAT header lines when necessary to make the file conform to the VCF standard.

Next comes a tab-delimited header line with the following N+9 fields (where N is the number of samples), and one tab-delimited line per variant with the same fields:

#CHROM	Chromosome code
POS	Base-pair coordinate
ID	Variant identifier
REF	Reference allele (missing = 'N')
ALT	All alternate alleles, comma-separated (missing = '.')
QUAL	Phred-scaled quality score for whether the locus is variable at all
FILTER	'PASS', '.', or semicolon-separated list of failing filter codes
INFO	Semicolon-separated list of flags and key-value pairs, with types declared in header
FORMAT	'GT', 'DS', 'HDS', and/or 'GP' can be emitted by PLINK 2
<Sample ID>, ...	Genotype/dosage calls

Allele codes are supposed to either start with '<', only contain characters in the set {A,C,G,T,N,a,c,g,t,n}, be an isolated '*', or represent a breakend. --export issues a warning if an allele code does not satisfy this restriction.

The full VCFv4.3 specification is in the hts-specs GitHub repository; this includes details on the BCF binary encoding.

.vcor (LD-statistic report)

Produced by --r[2]-[un]phased when in its default tabular-output mode.

A text file with a header line, and one line per variant-pair passing all filters. The following columns are present:

Header	Column set	Contents
CHROM_A	chrom	Chromosome code for first variant in pair
POS_A	pos	Base-pair coordinate of first variant in pair
ID_A	id	ID of first variant in pair
*REF_A*	*ref*⁴	Reference allele for first variant in pair
ALT1_A	alt1	Alternate allele 1 for first variant in pair
ALT_A	alt	Comma-separated alternate alleles for first variant in pair
*PROVISIONAL_REF_A?*	maybeprovref, provref	Reports whether REF_A allele is provisional
*MAJ_A*	*maj*⁴	Major allele for first variant in pair
NONMAJ_A	nonmaj	Comma-separated nonmajor alleles for first variant in pair
NONMAJ_FREQ_A	freq	(1 - <major-allele frequency>) for first variant in pair
CHROM_B	chrom	Chromosome code for second variant in pair
POS_B	pos	Base-pair coordinate of second variant in pair
ID_B	id	ID of second variant in pair
*REF_B*	*ref*⁴	Reference allele for second variant in pair
ALT1_B	alt1	Alternate allele 1 for second variant in pair
ALT_B	alt	Comma-separated alternate alleles for second variant in pair
*PROVISIONAL_REF_B?*	maybeprovref, provref	Reports whether REF_B allele is provisional
*MAJ_B*	*maj*⁴	Major allele for second variant in pair
NONMAJ_B	nonmaj	Comma-separated nonmajor alleles for second variant in pair
NONMAJ_FREQ_B	freq	(1 - <major-allele frequency>) for second variant in pair
[UN]PHASED_R[2]	(required)	Variant correlation coefficient
D	d	Linkage disequilibrium D (phased only)
DPRIME	dprime	Lewontin's D' (phased only)
ABS_DPRIME	dprimeabs	Absolute value of Lewontin's D' (phased only)

Sign of [UN]PHASED_R, D, and DPRIME is positive when the major (or, with 'ref-based', REF) alleles are positively correlated.

4: The 'maj' (or 'ref' when the 'ref-based' modifier is specified) column-set is included by default in --r-phased and --r-unphased's tabular output, but excluded by default for --r2-phased and --r2-unphased.

.vcor{1|2}[.bin] (variant-correlation matrix)

Produced by --r[2]-[un]phased when in matrix-output mode; the exact file extension distinguishes phased vs. unphased (which appears in the component before '.vcor1' or '.vcor2'), r vs. r², and text vs. binary format. Accompanied by a <matrix filename>.vars file containing variant IDs.

Possible shapes are the same as for .king files, except that triangular files include the diagonal.

.vmiss (variant-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
MISSING_DOSAGE_CT	nmissdosage	Number of missing dosages
MISSING_CT	nmiss	Number of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CT	nmisshh	Number of missing hardcalls, counting het haploids
HETHAP_CT	hethap	Number of heterozygous haploid hardcalls
OBS_CT	nobs	Denominator (# variants for males, excludes chrY for females)
F_MISS_DOSAGE	fmissdosage	Missing dosage rate
F_MISS	fmiss	Missing hardcall rate, not counting het haploids
F_MISS_AND_HETHAP	fmisshh	Missing hardcall rate, counting het haploids
F_HETHAP	fhethap	Heterozygous haploid rate.

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).

.vscore (text variant score report)

Produced by --variant-score.

A text file with a header line, and then one line per sample with the following columns:

Header	Column set	Contents
CHROM	chrom	Chromosome code
POS	pos	Base-pair coordinate
ID	(required)	Variant ID
REF	ref	Reference allele
ALT1	alt1	Alternate allele 1
ALT	alt	All alternate alleles, comma-separated
*PROVISIONAL_REF?*	maybeprovref, provref	Reports whether REF allele is provisional
ALT_FREQ	altfreq	ALT total-frequency used for mean-imputation
MISSING_CT	nmiss	Number of missing (and thus mean-imputed) dosages
OBS_CT	nobs	Number of (nonmissing) sample observations
<Variant score>, ...	(required)	Variant scores

.vscore.bin (binary variant scores)

Produced by "--variant-score bin". Accompanied by .vscore.cols and .vscore.vars text files containing column (score) and row (variant ID) labels, respectively.

A matrix of double-precision (8-byte) floating point variant scores.

Tutorial Setup >>