Introduction, downloads
D: 19 May 2022
Recent version history
What's new?
Coming next
[Jump to search box]
General usage
Getting started
Column set descriptors
Citation instructions
Standard data input
PLINK 1 binary (.bed)
PLINK 2 binary (.pgen)
Autoconversion behavior
VCF/BCF (.vcf[.gz], .bcf)
Oxford genotype (.bgen)
Oxford haplotype (.haps)
PLINK 1 text (.ped, .tped)
PLINK 1 dosage
Sample ID conversion
Dosage import settings
Generate random
Unusual chromosome IDs
Allele frequencies
Phenotypes
Covariates
'Cluster' import
Reference genome (.fa)
Input filtering
Sample ID file
Variant ID file
Interval-BED file
--extract-col-cond
QUAL, FILTER, INFO
Chromosomes
SNPs only
Simple variant window
Multiple variant ranges
Deduplicate variants
Sample/variant thinning
Pheno./covar. condition
Missingness
Category subset
--keep-col-match
Missing genotypes
Number of distinct alleles
Allele frequencies/counts
Hardy-Weinberg
Imputation quality
Sex
Founder status
Main functions
Data management
--make-[b]pgen/--make-bed
--export
--output-chr
--split-par/--merge-par
--set-all-var-ids
--recover-var-ids
--update-map...
--update-ids...
--ref-allele
--ref-from-fa
--normalize
--indiv-sort
--write-covar
--variance-standardize
--quantile-normalize
--split-cat-pheno
--pmerge[-list]
--write-samples
Basic statistics
--freq
--geno-counts
--sample-counts
--missing
--genotyping-rate
--hardy
--het
--fst
--pgen-info
Pairwise diffs
--pgen-diff
--sample-diff
Linkage disequilibrium
--indep...
--ld
Sample-distance matrices
Relationship/covariance
(--make-grm-bin...)
--make-king...
--king-cutoff
Population stratification
--pca
PCA projection
Association analysis
--glm
--glm ERRCODE values
--adjust-file
Linear scoring
--score
--variant-score
Distributed computation
Command-line help
Miscellaneous
Flag/parameter reuse
System resource usage
--loop-cats
.zst decompression
Pseudorandom numbers
Warnings as errors
.pgen validation
Resources
1000 Genomes phase 3
FASTA files
Errors and warnings
Output file list
Order of operations
Google groups
Credits
File formats
Quick index search
This page requires JavaScript.
Resources
This page is under construction. If there's something you consider to be an essential PLINK resource which is not mentioned on this page, contact us and/or comment in the plink2-users Google group.
The linked files are currently hosted by Dropbox. If you are unable to download them, contact us for access to an alternate source; we understand that Dropbox is blocked in some locations.
Genotype data
Callset:
2021-07-12 NYGC (build 38, 3202 samples, contigs unphased)
2016-05-05 primary release (build 37, 2504 samples)
(main source , chrY/chrM/contigs source )
(source)
Split by chromosome?
Keep singleton variants?
(more info...)
(hide info)
The no-singleton dataset can be a good starting point if you were planning on filtering out low-MAF variants anyway, or you're constrained to ≤ 8 GiB of workspace memory.
INFO annotations?
KING-based pedigree corrections?
(more info...)
(hide info)
The KING-robust algorithm is very effective at identifying 1st-degree relations within a population, and its output can also be used to distinguish between parent-child vs. sibling relationships — IBS0 is much higher for the latter, all other things being equal. One relationship previously flagged by KING-robust (NA20317-NA20318) was formally acknowledged to be a probable clerical error, and added to the official pedigree on 2020-07-31. (phase3_orig.psam does not contain this relationship, since it is based on the 2016-05-05 snapshot of the official pedigree.)
The KING-corrected .psam files contain this relationship, along with a few others with similarly strong supporting evidence (see the .kin0 file below).
all_hg38.pgen.zst (3.16 GiB, requires --allow-extra-chr )
all_hg38_ns.pgen.zst (3.10 GiB, requires --allow-extra-chr )
all_phase3.pgen.zst (2.25 GiB)
all_phase3_ns.pgen.zst (2.13 GiB)
all_hg38.pvar.zst (4.41 GiB, >90% of this is annotations)
all_hg38_noannot.pvar.zst (359 MiB) (rename to "all_hg38.pvar.zst" before use)
all_hg38_ns.pvar.zst (3.89 GiB; >90% of this is annotations)
all_hg38_ns_noannot.pvar.zst (296 MiB) (rename to "all_hg38_ns.pvar.zst" before use)
all_phase3.pvar.zst (1.26 GiB)
all_phase3_noannot.pvar.zst (614 MiB) (rename to "all_phase3.pvar.zst" before use)
all_phase3_ns.pvar.zst (812 MiB)
all_phase3_ns_noannot.pvar.zst (362 MiB) (rename to "all_phase3_ns.pvar.zst" before use)
hg38_corrected.psam
hg38_orig.psam
phase3_corrected.psam
phase3_orig.psam
(rename to "all_hg38.psam" before use)
(rename to "all_hg38_ns.psam" before use)
(rename to "all_phase3.psam" before use)
(rename to "all_phase3_ns.psam" before use)
Common sample information file (not for chrY/chrM) :
hg38_corrected.psam .
hg38_orig.psam .
phase3_corrected.psam .
phase3_orig.psam . Create symlinks from
chr1_hg38.psam, chr2_hg38.psam,
chr1_phase3.psam, chr2_phase3.psam,
etc. to this (or make a bunch of copies).
Remove "_noannot" from the .pvar.zst filenames before use.
chr1_hg38.pgen.zst (236 MiB),
chr1_hg38.pvar.zst (347 MiB)
chr1_hg38_noannot.pvar.zst (27.1 MiB)
chr2_hg38.pgen.zst (247 MiB),
chr2_hg38.pvar.zst (365 MiB)
chr2_hg38_noannot.pvar.zst (29.4 MiB)
chr3_hg38.pgen.zst (204 MiB),
chr3_hg38.pvar.zst (298 MiB)
chr3_hg38_noannot.pvar.zst (23.8 MiB)
chr4_hg38.pgen.zst (196 MiB),
chr4_hg38.pvar.zst (290 MiB)
chr4_hg38_noannot.pvar.zst (23.3 MiB)
chr5_hg38.pgen.zst (183 MiB),
chr5_hg38.pvar.zst (271 MiB)
chr5_hg38_noannot.pvar.zst (21.6 MiB)
chr6_hg38.pgen.zst (178 MiB),
chr6_hg38.pvar.zst (259 MiB)
chr6_hg38_noannot.pvar.zst (20.6 MiB)
chr7_hg38.pgen.zst (176 MiB),
chr7_hg38.pvar.zst (252 MiB)
chr7_hg38_noannot.pvar.zst (19.8 MiB)
chr8_hg38.pgen.zst (159 MiB),
chr8_hg38.pvar.zst (232 MiB)
chr8_hg38_noannot.pvar.zst (18.5 MiB)
chr9_hg38.pgen.zst (140 MiB),
chr9_hg38.pvar.zst (195 MiB)
chr9_hg38_noannot.pvar.zst (15.0 MiB)
chr10_hg38.pgen.zst (150 MiB),
chr10_hg38.pvar.zst (213 MiB)
chr10_hg38_noannot.pvar.zst (17.5 MiB)
chr11_hg38.pgen.zst (140 MiB),
chr11_hg38.pvar.zst (205 MiB)
chr11_hg38_noannot.pvar.zst (16.5 MiB)
chr12_hg38.pgen.zst (143 MiB),
chr12_hg38.pvar.zst (202 MiB)
chr12_hg38_noannot.pvar.zst (16.3 MiB)
chr13_hg38.pgen.zst (106 MiB),
chr13_hg38.pvar.zst (152 MiB)
chr13_hg38_noannot.pvar.zst (12.3 MiB)
chr14_hg38.pgen.zst (98.4 MiB),
chr14_hg38.pvar.zst (139 MiB)
chr14_hg38_noannot.pvar.zst (11.5 MiB)
chr15_hg38.pgen.zst (97.0 MiB),
chr15_hg38.pvar.zst (131 MiB)
chr15_hg38_noannot.pvar.zst (10.6 MiB)
chr16_hg38.pgen.zst (107 MiB),
chr16_hg38.pvar.zst (146 MiB)
chr16_hg38_noannot.pvar.zst (11.7 MiB)
chr17_hg38.pgen.zst (94.7 MiB),
chr17_hg38.pvar.zst (129 MiB)
chr17_hg38_noannot.pvar.zst (10.2 MiB)
chr18_hg38.pgen.zst (87.4 MiB),
chr18_hg38.pvar.zst (120 MiB)
chr18_hg38_noannot.pvar.zst (9.86 MiB)
chr19_hg38.pgen.zst (80.4 MiB),
chr19_hg38.pvar.zst (106 MiB)
chr19_hg38_noannot.pvar.zst (8.10 MiB)
chr20_hg38.pgen.zst (73.9 MiB),
chr20_hg38.pvar.zst (101 MiB)
chr20_hg38_noannot.pvar.zst (7.99 MiB)
chr21_hg38.pgen.zst (46.4 MiB),
chr21_hg38.pvar.zst (62.4 MiB)
chr21_hg38_noannot.pvar.zst (5.01 MiB)
chr22_hg38.pgen.zst (50.4 MiB),
chr22_hg38.pvar.zst (67.7 MiB)
chr22_hg38_noannot.pvar.zst (5.03 MiB)
chrX_hg38.pgen.zst (95.8 MiB),
chrX_hg38.pvar.zst (161 MiB)
chrX_hg38_noannot.pvar.zst (9.05 MiB)
chrY_hg38.pgen.zst (7.83 MiB),
chrY_hg38.pvar.zst (8.07 MiB)
chrY_hg38_noannot.pvar.zst (734 KiB)
chrM_hg38.pgen.zst (69.1 KiB),
chrM_hg38.pvar.zst (188 KiB)
chrM_hg38_noannot.pvar.zst (18.0 KiB)
contigs_hg38.pgen.zst (63.3 MiB),
contigs_hg38.pvar.zst (137 MiB)
contigs_hg38_noannot.pvar.zst (5.16 MiB)
chr1_phase3.pgen.zst (172 MiB),
chr1_phase3.pvar.zst (100 MiB)
chr1_phase3_noannot.pvar.zst (47.5 MiB)
chr2_phase3.pgen.zst (185 MiB),
chr2_phase3.pvar.zst (110 MiB)
chr2_phase3_noannot.pvar.zst (52.0 MiB)
chr3_phase3.pgen.zst (153 MiB),
chr3_phase3.pvar.zst (90.6 MiB)
chr3_phase3_noannot.pvar.zst (42.9 MiB)
chr4_phase3.pgen.zst (150 MiB),
chr4_phase3.pvar.zst (89.1 MiB)
chr4_phase3_noannot.pvar.zst (42.2 MiB)
chr5_phase3.pgen.zst (136 MiB),
chr5_phase3.pvar.zst (81.6 MiB)
chr5_phase3_noannot.pvar.zst (38.8 MiB)
chr6_phase3.pgen.zst (136 MiB),
chr6_phase3.pvar.zst (78.4 MiB)
chr6_phase3_noannot.pvar.zst (36.9 MiB)
chr7_phase3.pgen.zst (131 MiB),
chr7_phase3.pvar.zst (73.5 MiB)
chr7_phase3_noannot.pvar.zst (34.6 MiB)
chr8_phase3.pgen.zst (121 MiB),
chr8_phase3.pvar.zst (71.2 MiB)
chr8_phase3_noannot.pvar.zst (33.7 MiB)
chr9_phase3.pgen.zst (103 MiB),
chr9_phase3.pvar.zst (55.6 MiB)
chr9_phase3_noannot.pvar.zst (26.2 MiB)
chr10_phase3.pgen.zst (111 MiB),
chr10_phase3.pvar.zst (62.3 MiB)
chr10_phase3_noannot.pvar.zst (29.3 MiB)
chr11_phase3.pgen.zst (107 MiB),
chr11_phase3.pvar.zst (62.7 MiB)
chr11_phase3_noannot.pvar.zst (29.7 MiB)
chr12_phase3.pgen.zst (106 MiB),
chr12_phase3.pvar.zst (59.8 MiB)
chr12_phase3_noannot.pvar.zst (28.2 MiB)
chr13_phase3.pgen.zst (78.4 MiB),
chr13_phase3.pvar.zst (44.6 MiB)
chr13_phase3_noannot.pvar.zst (21.0 MiB)
chr14_phase3.pgen.zst (73.6 MiB),
chr14_phase3.pvar.zst (41.4 MiB)
chr14_phase3_noannot.pvar.zst (19.5 MiB)
chr15_phase3.pgen.zst (71.6 MiB),
chr15_phase3.pvar.zst (38.0 MiB)
chr15_phase3_noannot.pvar.zst (17.9 MiB)
chr16_phase3.pgen.zst (79.8 MiB),
chr16_phase3.pvar.zst (42.0 MiB)
chr16_phase3_noannot.pvar.zst (19.8 MiB)
chr17_phase3.pgen.zst (68.2 MiB),
chr17_phase3.pvar.zst (36.4 MiB)
chr17_phase3_noannot.pvar.zst (17.1 MiB)
chr18_phase3.pgen.zst (65.2 MiB),
chr18_phase3.pvar.zst (35.4 MiB)
chr18_phase3_noannot.pvar.zst (16.8 MiB)
chr19_phase3.pgen.zst (57.6 MiB),
chr19_phase3.pvar.zst (28.9 MiB)
chr19_phase3_noannot.pvar.zst (13.5 MiB)
chr20_phase3.pgen.zst (52.5 MiB),
chr20_phase3.pvar.zst (28.2 MiB)
chr20_phase3_noannot.pvar.zst (13.3 MiB)
chr21_phase3.pgen.zst (34.6 MiB),
chr21_phase3.pvar.zst (17.4 MiB)
chr21_phase3_noannot.pvar.zst (8.08 MiB)
chr22_phase3.pgen.zst (35.8 MiB),
chr22_phase3.pvar.zst (17.4 MiB)
chr22_phase3_noannot.pvar.zst (8.20 MiB)
chrX_phase3.pgen.zst (73.0 MiB),
chrX_phase3.pvar.zst (44.7 MiB)
chrX_phase3_noannot.pvar.zst (18.3 MiB)
chrY_phase3.pgen.zst (325 KiB),
chrY_phase3.pvar.zst (605 KiB),
chrY_phase3_noannot.pvar.zst (241 KiB),
chrY_phase3.psam (1233 samples)
chrM_phase3.pgen.zst (50.4 KiB),
chrM_phase3.pvar.zst (15.7 KiB),
chrM_phase3_noannot.pvar.zst (10.4 KiB),
chrM_phase3_corrected.psam
chrM_phase3_orig.psam
(2534 samples, rename to "chrM_phase3.psam" before use)
Notes:
.pgen.zst file(s) must be decompressed before use. (This isn't necessary for .pvar.zst files: see --pfile's 'vzs' modifier .) If you don't have another .zst decompressor installed, you can use PLINK 2 for this purpose:
plink2 --zst-decompress all_hg38.pgen.zst > all_hg38.pgen
In addition to ~600 trios which were intentionally included, this dataset contains a few close relations which are not described in the .psam file, e.g. sibships where neither parent was sequenced. Use --remove with one of the following ID lists when you don't want close relations:
These lists were generated from the original dataset with "--king-cutoff 0.177" and "--king-cutoff 0.0884", respectively. If you're curious, here's the --make-king-table + --king-table-filter report listing all 1st/2nd-degree related sample pairs: deg2_hg38.kin0
This dataset was intended to contain only unrelated samples; unfortunately, a few parent-child pairs, sibships, and second-degree relationships snuck in. Use --remove with one of the following ID lists when you don't want close relations:
These lists were generated from the original dataset with "--king-cutoff 0.177" and "--king-cutoff 0.0884", respectively. If you're curious, here's the --make-king-table + --king-table-filter report listing all 1st/2nd-degree related sample pairs: deg2_phase3.kin0
This dataset fuses results from two different pipelines. The primary chr1..chrX genotypes are phased, contain no missing calls, and only have biallelic left-normalized variants (multiallelic variants were "split"). The chrY/chrM/contigs genotypes are unphased, contain some missing calls, multiallelic variants there are unsplit, and there are a few variants which aren't left-normalized .
All relevant information in the original phased chr1..chrX callset is preserved. The chrY/chrM/contigs source material contains per-genotype AD, DP, GQ, and PL fields which cannot be represented by the .pgen file format, and are consequently not preserved.
This dataset contains (unsplit) multiallelic variants, and a few variants which aren't left-normalized .
Refer to the 1000 Genomes website for additional sample information , data usage rules , and citation instructions .
These are the reference genomes that the aforementioned 1000 Genomes samples were aligned against. Note that --fa can directly read these compressed files.
Errors and warnings >>