D: 2 Jul 2025 Main functions (--make-grm-bin...) Quick index search |
File format referenceThis page describes specialized PLINK 2.0 input and output file formats which are identifiable by file extension. (Most extensions not listed here have very simple one-entry-per-line or two-entry-per-line text formats.) Unless otherwise specified, all multicolumn text files generated by PLINK 2.0 are tab-delimited, with one header line starting with '#'. In the column summaries, columns which are present unless removed by the column set descriptor are boldface, and columns which only appear under some data/flag/modifier combination(s) are italicized. Jump to: .acount | .adjusted | .afreq | .allele.no.snp | .bcf | .bed | .bgen | .bim | .bins | .clumps | .cov | .eigenvec{,.allele|.var} | .fam | .fst.summary | .fst.var | .gcount | .gen | .geno | .glm.firth | .glm.linear | .glm.logistic[.hybrid] | .grm | .grm.N.bin | .grm.bin | .haps | .hardy | .hardy.x | .het | .*.id | .ind | .kin0 | .king[.bin] | .legend | .map | .pdiff | .ped | .pgen{,.pgi} | .phy | .psam | .pvar | .raw | .rel[.bin] | .sample | .scount | .sdiff | .sdiff.summary | .sexcheck | .smiss | .snp | .sscore | .ssf.tsv | .svd.pheno | .svd.pheno_wts | .tfam | .tped | .traw | .vcf | .vcor | .vcor{1|2}[.bin] | .vmiss | .vscore | .vscore.bin .acount, .afreq (allele count/frequency report)Produced by --freq. A text file with a header line, and then one line per variant with the following columns:
.adjusted (basic multiple-testing corrections)Produced by --adjust[-file]. A text file with a header line, and then one line per tested allele with the following columns:
Entries are sorted in increasing p-value order. (Thus, if the QQ field is present, its values just increase linearly.) .allele.no.snp (allele mismatch report)Produced by --update-alleles when there are too many mismatches between the loaded alleles for a variant and the old-allele column(s) of the --update-alleles input file.. A text file with no header line, and one line per mismatching variant with the following three fields:
.bcf (binary Variant Call Format)Variant information + sample ID + genotype call binary file. Imported with --bcf, and produced by "--export bcf". Refer to the hts-specs GitHub repository for a detailed description of the format. "--export bcf" uses binary encoding v2.2. .bed (PLINK 1 binary biallelic genotype table)PLINK 1's preferred way to represent genotype calls. Must be accompanied by .bim and .fam files. Loaded with --bfile, and generated by --make-bed. Do not confuse this with the UCSC Genome Browser's BED format, which is totally different. (It is safe to change a PLINK 1 .bed file's extension to .pgen and use --bpfile to load it.) See the PLINK 1.9 documentation for a detailed description of the usual variant-major form, along with an example. PLINK 2 can also efficiently export the sample-major form ("--export ind-major-bed"); it has third byte equal to zero instead of one, but is otherwise analogous. .bgen (Oxford variant info + genomic data binary file)Native binary file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. BGEN v1.1 files should always be accompanied by a .sample file. Loaded with --bgen, and produced by "--export bgen-1.{1,2,3}". Refer to https://www.chg.ox.ac.uk/~gav/bgen_format/ for a detailed description of the format. .bim (PLINK extended MAP file)Variant information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-bim can be used to update just this file.) A text file with no header line, and one line per variant with the following six fields:
A few notes:
.bins (allele count or frequency histogram)A text file with a header line, followed by one line per [start, end) histogram bin with the following two fields:
The end of the current bin interval is the next line's BIN_START value (or positive infinity if there is no next line). .clumps (reprocessed LD-clumped reports)Produced by --clump. A text file with a header line, and one line per index variant (lowest p-values first) with the following fields:
S<bin boundary> columns are in decreasing-p-value order, and the bin-boundary component of the column names no longer omit the leading "0.". .cov (covariate table)Produced by --write-covar, --make-[b]pgen/--make-bed, and --export when covariates have been loaded/specified. Valid input for --covar. A text file with a header line, and one line per sample with the following columns:
(Note that --covar can also be used with files lacking a header row.) .eigenvec, .eigenvec.allele, .eigenvec.var (principal components)Produced by --pca. Accompanied by an .eigenval file, which contains one eigenvalue per line. The .eigenvec file is a text file with a header line and between 1+V and 3+V columns per sample, where V is the number of requested principal components. The first columns contain the sample ID, and the rest are principal component scores in the same order as the .eigenval values (with column headers 'PC1', 'PC2', ...). With the 'allele-wts' modifier, an .eigenvec.allele file is also generated. It's a text file with a header line, followed by one line per allele with the following columns:
Alternatively, with the 'biallelic-var-wts' modifier, an old-style .eigenvec.var file is generated. It's a text file with a header line, followed by one line per variant with the following columns:
.fam (PLINK sample information file)Sample information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-fam can be used to update just this file.) A text file with no header line, and one line per sample with the following six fields:
.fst.summary (all-population-pairs Wright's FST report)Produced by --fst. A text file with a header line, and then one line per population-pair with the following columns:
.fst.var (per-variant Wright's FST report for one population pair)Produced by --fst when 'report-variants' is specified. A separate file is generated for each population pair. A text file with a header line, and then one line per autosomal variant with the following columns:
.gcount (genotype count report)Produced by --geno-counts. A text file with a header line, and then one line per variant with the following columns:
.gen (Oxford text genotype file format)Native text genotype file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. Should always be accompanied by a .sample file. Imported with --data/--gen, and produced by "--export oxford[-v2]". A text file with no header line, and one line per variant with either 3N+5 or 3N+6 fields where N is the number of samples. Each line stores information for a single SNP. In the 3N+5 case (corresponding to the original specification), the first five fields are:
Unless the chromosome code was declared with --oxford-single-chr (in which case the SNP ID column is ignored), PLINK has no choice but to assume that the "SNP ID" column actually stores chromosome codes. (This is the convention when PLINK exports a 5-leading-column .gen file.) The newer 3N+6 column flavor has a dedicated chromosome column in front. This was not supported by PLINK 1.9 or 2.0 before 16 Apr 2021. Each subsequent triplet of values then indicate likelihoods of homozygote A1, heterozygote, and homozygote A2 genotypes at this variant, respectively, for one sample. If they add up to less than one, the remainder is a no-call probability weight. The PLINK 2 binary format can represent allele count expected values, but it does not distinguish between e.g. {P(hom-ref)=0.28, P(het)=0.52, P(hom-alt)=0.2} and {P(hom-ref)=0.08, P(het)=0.92, P(hom-alt)=0}, and it ignores the no-call probability weight (though "0 0 0" will be correctly converted to a missing call). The --import-dosage-certainty flag can be used during import to replace some of the most uncertain genotype calls with missing values. .geno (EIGENSOFT PACKEDANCESTRYMAP or TGENO binary genotype format)Native binary genotype file format for EIGENSOFT and ADMIXTOOLS. Should always be accompanied by .ind and .snp files. Imported with --eigfile/--eiggeno. The original variant-major PACKEDANCESTRYMAP form is produced by "--export eig", while the sample-major TGENO form is produced by "--export eigt" A PACKEDANCESTRYMAP file has V+1 blocks of max(48, ⌈N/4⌉) bytes each, where V is the number of variants and N is the number of samples. The first block is a header, starting with a space/tab-delimited string with the following entries:
and followed by null bytes. In C, the hasharr() function can be defined as follows: uint32_t hasharr(const char* const* ids, uint32_t n) { where hashone() is: uint32_t hashone(const char* id) { Each remaining block corresponds to one marker in the .snp file. The high-order two bits of a block's first byte store the first sample's genotype code, using the following encoding:
Next three samples are stored in lower-order bits of the first byte, etc. Trailing bits in the block are always zero. A TGENO file has a 48-byte header, starting with a space/tab-delimited string; the first field in that string is "TGENO", and the other four fields match those in the PACKEDANCESTRYMAP header. This is followed by N blocks of max(48, ⌈V/4⌉) bytes each, each corresponding to one sample in the .ind file. Encoding is otherwise the same as with PACKEDANCESTRYMAP. .glm.firth, .glm.logistic[.hybrid] (logistic/Firth regression association statistics)Produced by --glm with a case/control phenotype. A text file with a header line, and then one line per variant with the following columns:
All statistics are computed across just the samples used in the regression. .glm.linear (linear regression association statistics)Produced by --glm with a quantitative phenotype. A text file with a header line, and then one line per variant with the following columns:
All statistics are computed across just the samples used in the regression. 1: For multiallelic variants, this column may contain multiple comma-separated alleles when the result doesn't depend on which allele is A1. .grm (GCTA text relationship matrix)Produced by --make-grm-list. A text file with no header line, and one line per pair of samples (not necessarily distinct) with the following four fields:
.grm.N.bin, .grm.bin (GCTA 1.1+ triangular binary relationship matrix)Produced by --make-grm-bin. These files contain single-precision (4-byte) floating point values. Using 1-based matrix indices, the first value in each file is the (1, 1) relationship value (.grm.bin) or observation count (.grm.N.bin); the second and third values are the (2, 1) and (2, 2) relationships/counts; the fourth through sixth values are the (3, 1), (3, 2) and (3, 3) relationships/counts in that order; and so on. Note that .grm.bin files generated by GCTA versions before 1.1 have a different format. .haps (Oxford phased haplotype file)Reference panel haplotype file format for IMPUTE2. Must be accompanied by a .legend file when no variant info header columns are present. Imported with --haps, and produced by "--export haps[legend]". A text file with no header line, and either 2N+5 or 2N fields where N is the number of samples. In the former case, the first five columns are:
This is followed by a pair of 0/1-valued haplotype columns for the first sample, then a pair of haplotype columns for the second sample, etc. (For male samples on chrX, the second column may contain dummy '-' entries; otherwise, missing genotype calls are not permitted.) .hardy (Hardy-Weinberg equilibrium exact test report)Produced by --hardy when autosomal diploid variants are present. A text file with a header line, and one line per autosomal diploid variant with the following columns:
.hardy.x (Graffelman-Weir extended chrX HWE test report)Produced by --hardy when chrX variants are present. A text file with a header line, and one line per chrX variant with the following columns:
.het (method-of-moments F coefficient estimates)Produced by --het. A text file with a header line, and one line per sample with the following columns:
.id (Sample ID list)When generated by PLINK 2, this is a text file which may or may not have a header line. If there's no header line (default with .grm.id files, can be forced for other .id files with --no-id-header), and there's a single column, they are IIDs; if there are two columns, they are FID/IID. Otherwise, there's one line per sample after the header line with the following columns:
.ind (EIGENSOFT sample information file)Sample information file accompanying an EIGENSOFT .geno binary genotype table. Loaded with --eigfile/--eigind, and produced by --export eig[t]. A text file with no header line, and one line per sample with the following three fields:
.kin0 (KING-robust kinship coefficient report)Produced by --make-king-table. A text file with a header line, and one line per sample pair with kinship coefficient no smaller than the --king-table-filter value. When --king-table-filter is not specified, all sample pairs are included. The following columns are present:
.king[.bin] (KING-robust kinship coefficient matrix)Produced by --make-king. Accompanied by a .king[.bin].id file containing sample IDs. If text, a tab-delimited file that is either lower-triangular (excluding the diagonal) or square. If it's square, the upper-right triangle may be either zeroed out or the mirror-image of the lower-left triangle, depending on whether the 'square0' or 'square' modifier was used. The binary format is semantically identical; it just has nothing but single- (4-byte) or double-precision (8-byte) floating point values, instead of text+delimiters+linebreaks. .legend (Oxford single-chromosome variant information file)Single-chromosome variant information file accompanying a bare .haps reference panel haplotype file. Imported with --legend, and produced by "--export hapslegend". A text file with a header line, and one line per variant with the following four columns:
.map (PLINK 1 text fileset variant information file)Variant information file accompanying a .ped text pedigree + genotype table. A text file with no expected header line, and one line per variant with the following 3-4 fields:
All lines must have the same number of columns (so either no lines contain the centimorgans column, or all of them do). Lines starting with '#' are supposed to be treated as comments, but this was not consistently supported by PLINK 1.9 and 2.0 before Aug 2024. .pdiff (two-fileset genotype/dosage discordance report)Produced by --pgen-diff. A text file with a header line, and then one line per discordance with the following columns:
.ped (PLINK 1/MERLIN/Haploview sample-major text genotype table)Pedigree information + genotype call text file. Must be accompanied by a .map file. Loaded with --pedmap, and produced by "--export ped". This format is simultaneously highly inefficient, even relative to other text formats, and limited in scope (unobserved minor allele codes can't be stored); continued use is strongly discouraged. Contains no header line, and one line per sample with 2V+6 fields where V is the number of variants. The first six fields are the same as those in a .fam file. The seventh and eighth fields are allele calls for the first variant in the .map file ('0' = no call); the 9th and 10th are allele calls for the second variant; and so on. All variants must be biallelic (or monomorphic, or all-missing). If all alleles are single-character, PLINK 1.9 and 2.0 will correctly parse the more compact "compound genotype" variant of this format, where each genotype call is represented as a single two-character string. This does not require the use of an additional loading flag. You can produce such a file with "--export compound-genotypes". It is also possible to load .ped files missing some initial fields. Lines starting with '#' are supposed to be treated as comments, but this was not supported by PLINK 1.9 and 2.0 before Aug 2024. .pgen, .pgen.pgi (PLINK 2 binary genotype table)PLINK 2's preferred way to represent genotype calls. Must be accompanied by .pvar/.bim and .psam/.fam files. Loaded with --pfile/--bpfile, and generated with --make-pgen/--make-bpgen and all import commands. Most .pgen files have an embedded index, and do not have an accompanying .pgen.pgi file. When the index is not embedded, PLINK 2 expects it to be stored in "<.pgen filename>.pgi". A draft specification of these formats is available. The first version will be finalized around the beginning of PLINK 2.0 beta testing. .psam (PLINK 2 sample information file)Sample information file accompanying a .pgen binary genotype table. (--make-just-psam can be used to update just this file.) A text file which usually has at least one header line, where only the last header line starts with '#FID' or '#IID'. This final header line specifies the columns in the .psam file; the following intermediate column headers are recognized:
(FID must either be the first column, or absent. If it's absent, all FID values are now assumed to be '0'.) Any other value is treated as a phenotype/covariate name; see the phenotype/covariate documentation for column encoding details. If no header line is present, the columns are assumed to be in .fam file order (FID, IID, PAT, MAT, SEX, PHENO1). .phy (relaxed PHYLIP format)Multiple sequence alignment text file, produced by "--export phylip[-phased]", and recognized by FastTree, IQ-TREE, and several other phylogenetic tools. This format cannot be loaded by PLINK. The header line contains two numbers, the number of sequences followed by the number of nucleotide codes per sequence. Each subsequent line contains two fields. The first field contains the sample ID, and is padded by spaces to a fixed width, such that the longest sample ID is followed by exactly 3 spaces. (This imitates the behavior of vcf2phylip.) The second field contains IUPAC nucleotide codes. .pvar (PLINK 2 variant information file)Variant information file accompanying a .pgen binary genotype table. (--make-just-pvar can be used to update just this file.) A text file which usually has at least one header line, where only the last header line starts with '#CHROM'. This final header line specifies the columns in the .pvar file; the following intermediate column headers are recognized:
In particular, a VCF file, or a trimmed VCF file with all columns past the 5th (or 6th, etc.) removed, is valid input for anything expecting a .pvar-format file. The following VCF-style header lines are also recognized:
When no header line is present, the columns are assumed to be in .bim file order (CHROM, ID, CM, POS, ALT, REF; or if only 5 columns are present, CM is assumed to be omitted). .raw (additive + dominant component file)Produced by "--export {A,AD}"; suitable for loading from R. This format cannot be loaded by PLINK. A text file with a header line, and then one line per sample with V+6 (for "--export A") or 2V+6 (for "--export AD") fields, where V is the number of variants. The header line does not contain a preceding '#'. The first six fields are:
This is followed by one or two fields per variant:
If 'include-alt' was specified, the header line also names alternate allele codes in parentheses, e.g. 'rs5939319_G(/A)'. .rel[.bin] (relationship matrix)Produced by --make-rel. Accompanied by a .rel[.bin].id file containing sample IDs. Contents are identical to that of a .grm/.grm.bin file. Possible shapes are essentially the same as for .king files; the only difference is that .king files have an omitted or constant-0.5 diagonal while .rel files do not. .sample (Oxford sample information file)Sample information file accompanying a .gen or .bgen genotype dosage file, or a .haps phased reference panel. Loaded with --data/--sample, and produced by --export in several cases. By default, the .sample space-delimited files emitted by --export have two header lines, and then one line per sample with 4+ fields:
(As of 6 Apr 2021, PLINK 2 accepts 'C' as a synonym for column type 'P' in .sample input files.) With --export's 'sample-v2' modifier, this is adjusted to:
Note that older programs are likely to support only the first .sample dialect. A specification for this format is on the QCTOOL v2 website. .scount (sample variant-count report)Produced by --sample-counts. A text file with a header line, and then one line per discordance with the following columns:
The 'hetsnp', 'dipts'/'ts'/'diptv'/'tv', 'dipnonsnpsymb'/'nonsnpsymb', 'symbolic', and 'nonsnp' columns count each ALT allele in a heterozygous ALTx-ALTy genotype separately, since they can be of different subtypes. (I.e. if they are of the same subtype, the corresponding count is incremented by 2.) As a consequence, these columns are unaffected by variant split/join. 3: If the ALT allele in a chrX biallelic variant appears in exactly one female and one male, that counts as a singleton in this column for just the female. .sdiff (sample-pair discordance report)Produced by --sample-diff. A text file with a header line, and then one line per discordance with the following columns:
.sdiff.summary (sample-pair discordance count summary)Produced by --sample-diff. A text file with a header line, and then one line per sample pair with the following columns:
.sexcheck (sex imputation report)Produced by --check-sex/--impute-sex. A text file with a header line, and one line per sample with the following columns:
.smiss (sample-based missing data report)Produced by --missing. A text file with a header line, and then one line per variant with the following columns:
When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9). .snp (EIGENSOFT variant information file)Variant information file accompanying an EIGENSOFT .geno binary genotype table. Loaded with --eigfile/--eigsnp, and produced by --export eig[t]. A text file with no header line, and one line per variant with the following three fields:
.sscore (sample scores)Produced by --score and --score-list. A text file with a header line, and then one line per sample with the following columns:
.ssf.tsv (association statistics in GWAS-SSF format)Produced by --gwas-ssf postprocessing --glm output. A text file with a header line, and then one line per variant with the following columns:
(Since the --gwas-ssf command does not have a cols= modifier, boldface is used to denote mandatory GWAS-SSF fields in this table.) .svd.pheno (summary phenotypes generated via SVD)Produced by --pheno-svd. A text file with a header line, and then one line per sample with the following columns:
.svd.pheno_wts (singular values and right-singular vectors from phenotype SVD)Produced by --pheno-svd. A text file with a header line, and then one line per new phenotype with the following columns:
.tfam (PLINK 1 sample information file)Sample information file accompanying a .tped file; identical format to .fam files. .tped (PLINK 1 variant-major text genotype table)Variant information + genotype call text file. Must be accompanied by a .tfam file. Loaded with --tfile, and produced by "--export tped". Contains no header line, and one line per variant with 2N+4 fields where N is the number of samples. The first four fields are the same as those in a .map file. The fifth and sixth fields are allele calls for the first sample in the .tfam file ('0' = no call); the 7th and 8th are allele calls for the second sample; and so on. All variants must be biallelic (or monomorphic, or all-missing). .traw (variant-major additive component file)Produced by "--export Av"; suitable for loading from R. Loaded with --import-dosage (note that several modifiers must be specified). A text file with a header line without a leading '#', and then one line per variant with the following N+6 fields (where N is the number of samples):
.used_sites.tsv (variant information for relaxed-PHYLIP file)Produced by "--export phylip[-phased] used-sites". Accompanied by a .phy file. A text file with a header line, and then one line per variant with the following 3 fields:
.vcf, .bcf (1000 Genomes Project Variant Call Format)Variant information + sample ID + genotype call file; text if .vcf, binary if .bcf. Imported with --vcf/--bcf, and produced by "--export {b,v}cf". Note that, while PLINK 2.0 supports a much larger subset of the VCF standard than PLINK 1.9, it still isn't appropriate for general-purpose VCF handling. Instead, the goal is to provide a very useful complement to bcftools. For example, PLINK 2.0 does not save per-call read depths, so any data management or analysis which requires them to be kept around should be done with bcftools or a similarly general tool; but once you're done with variant calling/imputation and are ready to treat your data as a single matrix of hardcalls or dosages (possibly with missing entries), PLINK 2.0 is much more efficient. The VCFv4.3 files emitted by "--export vcf" start with the following three header lines:
This is usually followed by all the VCF header lines (if any) present in the loaded .pvar file, a "##chrSet=" chromosome set description when appropriate, and additional "##contig=", INFO/PR, and FORMAT header lines when necessary to make the file conform to the VCF standard. Next comes a tab-delimited header line with the following N+9 fields (where N is the number of samples), and one tab-delimited line per variant with the same fields:
Allele codes are supposed to either start with '<', only contain characters in the set {A,C,G,T,N,a,c,g,t,n}, be an isolated '*', or represent a breakend. --export issues a warning if an allele code does not satisfy this restriction. The full VCFv4.3 specification is in the hts-specs GitHub repository; this includes details on the BCF binary encoding. .vcor (LD-statistic report)Produced by --r[2]-[un]phased when in its default tabular-output mode. A text file with a header line, and one line per variant-pair passing all filters. The following columns are present:
Sign of [UN]PHASED_R, D, and DPRIME is positive when the major (or, with 'ref-based', REF) alleles are positively correlated. 4: The 'maj' (or 'ref' when the 'ref-based' modifier is specified) column-set is included by default in --r-phased and --r-unphased's tabular output, but excluded by default for --r2-phased and --r2-unphased. .vcor{1|2}[.bin] (variant-correlation matrix)Produced by --r[2]-[un]phased when in matrix-output mode; the exact file extension distinguishes phased vs. unphased (which appears in the component before '.vcor1' or '.vcor2'), r vs. r2, and text vs. binary format. Accompanied by a <matrix filename>.vars file containing variant IDs. Possible shapes are the same as for .king files, except that triangular files include the diagonal. .vmiss (variant-based missing data report)Produced by --missing. A text file with a header line, and then one line per variant with the following columns:
When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9). .vscore (text variant score report)Produced by --variant-score. A text file with a header line, and then one line per sample with the following columns:
.vscore.bin (binary variant scores)Produced by "--variant-score bin". Accompanied by .vscore.cols and .vscore.vars text files containing column (score) and row (variant ID) labels, respectively. A matrix of double-precision (8-byte) floating point variant scores. |