S: 18 Aug 2024 (b7.4) D: 18 Aug 2024 Main functions (--distance...) (--make-grm-bin...) (--ibs-test...) (--assoc, --model) (--mh, --mh2, --homog) (--assoc, --gxe) (--linear, --logistic)
Quick index search |
## Distance matrices## Identity-by-state/Hamming--distance [{square | square0 | triangle}] [{gz | bin | bin4}] ['ibs'] ['1-ibs'] ['allele-ct'] ['flat-missing']
- '
**square**', '**square0**', and '**triangle**' affect the shape of the output matrix. 'square' yields a symmetric matrix; 'triangle' (normally the default) yields a lower-triangular matrix where the first row contains only the <genome 1-genome 2> distance, the second row has the <genome 1-genome 3> and <genome 2-genome 3> distances in that order, etc.; and 'square0' yields a square matrix with all cells in the upper right triangle zeroed out. - '
**gz**' causes a gzipped file to be written to plink.dist.gz instead. - '
**bin**' causes the matrix to be written to plink.dist.bin using little-endian IEEE-754 double encoding (suitable for loading from R). When using 'bin', the default output shape is 'square' instead of 'triangle'. - '
**bin4**' uses IEEE-754 single-precision encoding, and is otherwise identical to 'bin'. This saves disk space, but you'll need to specify 4-byte single-precision input for your next analysis step. The following does so in R:
readBin('<filename>', what="numeric", n=<number of entries>, (Omit "size=4" to load the usual 8-byte encoding.)**size=4**)
- 'exp=<x>' causes a weight of (2
**q**(1-**q**))^{-x}to be applied to each variant, where**q**is the loaded or inferred MAF. - If a filename is provided instead, variant IDs are loaded from the first column and weights from the second. The first nonempty line of the file is normally skipped; add the '
**noheader**' modifier to keep it.
## Backwards compatibility--distance-matrix These deprecated flags generate space-delimited text matrices, and are included for backwards compatibility with scripts relying on the corresponding PLINK 1.07 flags. New scripts should migrate to "--distance 1-ibs flat-missing" and "--distance ibs flat-missing". Note that you are no longer required to use these flags in conjunction with --cluster. ## Reloading--read-dists <distance file> [ID file] If you've previously generated a distance matrix using "--distance triangle bin", this lets you reload it for --cluster, --neighbour, and the distance-phenotype analyses below. When no ID file is named, it is assumed that the distance matrix was generated with the same samples in the same order as in the current PLINK run. We are likely to extend this flag to support more --distance output formats in the future. ## Relationship/covariance--make-rel [{square | square0 | triangle}] [{gz | bin | bin4}] [{cov | ibc2 | ibc3}]
## Exporting to GCTA--make-grm-gz ['no-gz'] [{cov | ibc2 | ibc3}]
The --make-grm-bin computation was switched from single-precision to double-precision internal arithmetic in Nov 2014; see e.g. this real-world instance of insufficient precision leading to flawed science for motivation. (We don't actually expect any of GCTA's results to be dangerously inaccurate, especially when less than ~10 million markers are involved, but we figure a 1.2x-2x speed penalty here is an acceptable price to pay for peace of mind.) These computations can be subdivided with --parallel. ## Relationship-based pruning--rel-cutoff [maximum] If used in conjunction with a later calculation (see the order of operations page for details), PLINK tries to maximize the final sample size, but this maximum independent set problem is NP-hard, so we use a greedy algorithm which does not guarantee an optimal result. In practice, PLINK --rel-cutoff does yield a maximum set whenever there aren't too many intertwined close relations, and it outperforms GCTA --grm-cutoff when there are (we chose our greedy algorithm carefully); but if you want to try to beat both programs, use the --make-rel and --keep/--remove flags and patch your preferred approximation algorithm in between. (We may add one or two levels of backtracking to our --rel-cutoff if its level of imperfection becomes problematic.) Note that, while it is possible to use --rel-cutoff on a previously calculated relationship matrix by combining it with --grm-gz/--grm-bin (like how GCTA --grm-cutoff is used), we do not expect that to be the typical workflow. ## Distributed computation--make-rel and --make-grm-gz/--make-grm-bin jobs can be subdivided with the --parallel flag. However, --rel-cutoff cannot run concurrently with parallel relationship matrix evaluation; instead, it must act on the final assembled matrix. This is the primary use case for --grm-gz/--grm-bin. ## Distance-phenotype analysis## Case/control--ibs-test [permutation count] --groupdist [iteration count] [d]
To perform this type of analysis with scalar phenotype data, you may combine --ibs-test/--groupdist with the --tail-pheno flag. However, the distance-phenotype regression described next should be more informative. If --ibs-test is run with no parameters, 100000 permutations are used. If --groupdist is run with less than two parameters, d is set to <number of people> When combining these commands with --read-dists, units must match: "--distance triangle bin ibs" goes with --ibs-test, while "--distance triangle bin" goes with --groupdist. ## Distance-QT regression--regress-distance [iteration count] [d] These flags perform simple linear regressions and evaluate delete-d jackknife standard error estimates. With less than two parameters, d is set to <number of people> A previously calculated triangular binary distance matrix can be loaded as input to --regress-distance using --read-dists. There is currently no similar shortcut for --regress-rel. |