S: 18 Aug 2024 (b7.4) D: 18 Aug 2024 Main functions (--distance...) (--make-grm-bin...) (--ibs-test...) (--assoc, --model) (--mh, --mh2, --homog) (--assoc, --gxe) (--linear, --logistic) Core algorithms Quick index search |
Basic statisticsAllele frequency--freq [{counts | case-control}] ['gz'] --freqx ['gz'] By itself, --freq writes a minor allele frequency report to plink.frq. If you add the 'counts' modifier, an allele count report is written to plink.frq.count instead. Alternatively, you can use --freq with --within/--family to write a cluster-stratified frequency report to plink.frq.strat, or use the 'case-control' modifier to write a case/control phenotype-stratified report to plink.frq.cc. --freqx writes a more informative genotype count report to plink.frqx. For both flags, gzipped output can be requested with the 'gz' modifier. Nonfounders are normally excluded from these counts/frequencies; use --nonfounders to change this. All of these reports (except for --freq + --within/--family) are valid input for --read-freq; --freqx is the most powerful when used in that capacity, since it preserves deviation from Hardy-Weinberg equilibrium. Missing data--missing ['gz'] --missing produces sample-based and variant-based missing data reports. If run with --within/--family, the variant-based report is stratified by cluster. 'gz' causes the output files to be gzipped. --test-mishap tests whether genotype calls at the two adjacent variants can be used to predict missingness status of the current variant, writing results to plink.missing.hap. This can help one judge the safety of assuming missing calls are randomly distributed. Only autosomal diploid variants with at least 5 missing calls are included, and flanking haplotypes with frequency lower than the --maf threshold are ignored. (Nonfounders are no longer ignored.) The PLINK 1.07 documentation has further discussion of this test. See also --test-missing, which checks for association between missingness and a case/control phenotype. Hardy-Weinberg equilibrium--hardy ['midp'] ['gz'] --hardy writes a list of genotype counts and Hardy-Weinberg equilibrium exact test statistics to plink.hwe. With the 'midp' modifier, a mid-p adjustment is applied (see --hwe for discussion). 'gz' causes the output file to be gzipped. When the samples are case/control, three separate sets of Hardy-Weinberg equilibrium statistics are computed: one considering both cases and controls, one considering only cases, and one considering only controls. These are distinguished by 'ALL', 'AFF', and 'UNAFF' in the TEST column, respectively. If the phenotype is quantitative or nonexistent instead, there is just one line per variant, labeled 'ALL(QT)' or 'ALL(NP)' respectively. By default, only founders are considered when generating this report, so if you are working with e.g. a sibling-only dataset, you won't get any results. Use --nonfounders to include everyone. Unlike PLINK 1.07, PLINK 1.9 does not automatically filter out variants with H-W p-value less than 0.001 when --hardy is invoked. Combine --hardy with --hwe if you still want that to happen. Mendel errors--mendel ['summaries-only'] --mendel-duos --mendel scans the dataset for Mendel errors, writing a set of reports to plink{.mendel,.imendel,.fmendel,.lmendel}. Haploid and mitochondrial data are ignored. The errors are classified as follows, where '1' refers to the A1 (usually minor) allele and '2' refers to A2:
By default, samples with only one parent in the dataset are not considered, and when parental genotype data is missing, (great-)grandparental data is not checked; this can now be changed with --mendel-duos and --mendel-multigen, respectively. (Note that --mendel-multigen is best used on data which has not yet been subject to --set-me-missing.) If you only want summary statistics, use the 'summaries-only' modifier; this causes the .mendel file (which can be very large) to be skipped. When PLINK 1.07 --mendel was used either with --set-me-missing or without --make-bed/--recode, it would set some Mendel errors to missing before all errors were identified, and as a consequence some other errors were not noticed at all if overlapping trios were present. This no longer happens. Inbreeding--het ['small-sample'] ['gz'] --ibc --het computes observed and expected autosomal homozygous genotype counts for each sample, and reports method-of-moments F coefficient estimates (i.e. (<observed hom. count> - <expected count>) / (<total observations> - <expected count>)) to plink.het. (The 'gz' modifier has the usual effect.) Expected counts are based on loaded (via --read-freq) or imputed MAFs; if there are very few samples in your immediate fileset, --read-freq is practically mandatory since imputed MAFs are wildly inaccurate in that case. Also, due to the use of allele frequencies, if your dataset has a highly imbalanced ancestry distribution (e.g. >90% EUR but a few samples with ancestry primarily from other continents), you may need to process the rare-ancestry samples separately. By default, the n/(n-1) multiplier in Nei's expected homozygosity formula is now omitted, since n may be unknown when using --read-freq. The 'small-sample' modifier causes the multiplier to be included, while forcing --het to use imputed MAFs (and known ns) from founders in the immediate dataset. (--maf-succ is not applied here.) --ibc (ported from GCTA) calculates three inbreeding coefficients for each sample, and writes a report to plink.ibc. Briefly, Fhat1 is the usual variance-standardized relationship minus 1, Fhat2 is similar to the --het estimate, and Fhat3 is based on the correlation between uniting gametes. These calculations do not take LD into account. It is usually a good idea to perform some form of LD-based pruning before invoking them. Sex imputation--check-sex [female max F] [male min F] --check-sex ycount [female max F] [male min F] [female max Y obs] [male min Y obs] --check-sex normally compares sex assignments in the input dataset with those imputed from X chromosome inbreeding coefficients, and writes a report to plink.sexcheck.
There are now two modes which consider Y chromosome data.
--impute-sex changes sex assignments to the imputed values, while generating the .sexcheck report as well. To minimize surprises, we now force it to be used with --make-bed/--recode/--write-covar and no other commands. In the common case where sexes were known or imputed earlier in the pipeline but didn't make it into the .fam file for whatever reason, all male F estimates should be 1 after --split-x, so something as extreme as "--impute-sex 0.9 0.99" (or "--impute-sex y-only") should work. Wright's FST--fst ['case-control'] Given a set of subpopulations defined via --within, --fst writes FST estimates for each autosomal diploid variant (computed using the method introduced in Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure) to plink.fst, and reports raw and weighted global means to the log.
|