Introduction, downloads

D: 28 Oct 2018

Recent version history

What's new?

Coming next

General usage

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF (.vcf{.gz})

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 dosage

Dosage import settings

Generate random

Unusual chromosome IDs

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-fcol (was --filter)

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-{b}pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--write-samples

(TBD)

Resources

1000 Genomes phase 3

Output file list

Order of operations

Credits

File formats

Data management

Generate binary fileset

--make-pgen <vzs> <format=[code]> <trim-alts> <erase-phase> <erase-dosage> <pvar-cols=[col. set descriptor]> <psam-cols=[col. set descriptor]>
--make-bpgen <vzs> <format=[code]> <trim-alts> <erase-phase> <erase-dosage>

--make-bed <vzs> <trim-alts>

--make-pgen creates a new PLINK 2 binary fileset, after applying sample/variant filters and other operations below. For example,

plink2 --bgen input.bgen --maf 0.05 --make-pgen --out binary_fileset

does the following:

  1. Autogenerate binary_fileset-temporary.pgen + .pvar + .psam. (The MAF filter has not yet been applied at this stage. See the order of operations page for more details... well, ok, that isn't up yet, but everything's practically identical to PLINK 1.9's order.)
  2. Read binary_fileset-temporary.pgen + .pvar + .psam. Calculate MAFs. Remove all variants with MAF < 0.05 from the current analysis.
  3. Generate binary_fileset.pgen + .pvar + .psam. Any samples/variants removed from the current analysis are also not present in this fileset. (This is the --make-pgen step.)
  4. Delete binary_fileset-temporary.pgen + .pvar + .psam.

In contrast, the fileset left behind by --keep-autoconv is just the result of step 1.

--make-bed creates a PLINK 1 binary fileset instead, while --make-bpgen creates a hybrid fileset (main genotype table is in PLINK 2 format, sample and variant files use the PLINK 1 representation) loadable with --bpfile.

Other notes:

  • The 'vzs' modifier causes the variant file to be Zstd-compressed.
  • The 'format=' modifier requests an uncompressed fixed-variant-width .pgen file, which may be easier for some programs to read. (These do not directly support multiallelic variants.) For now, the only supported format code is '2', which is just like PLINK 1 .bed, except with an extended (12-byte instead of 3-byte) header containing variant and sample counts, and rotated genotype codes (00 = hom ref, 01 = het, 10 = hom alt, 11 = missing).
  • The 'erase-phase' and 'erase-dosage' modifiers prevent phase and dosage information from being written to the new .pgen.
  • The first five columns of a .pvar file are always #CHROM/POS/ID/REF/ALT. Supported optional .pvar column sets are:
    • xheader: All ## header lines (yeah, this is technically not a column). Without this, only the #CHROM header line is kept.
    • maybequal: QUAL. Omitted if all loaded values are missing.
    • qual: Force QUAL column to be written even when empty.
    • maybefilter: FILTER. Omitted if all loaded values are missing.
    • filter: Force FILTER column to be written even when empty.
    • maybeinfo: INFO. Omitted if all loaded values are missing, or if INFO:PR is the only subfield.
    • info: Force INFO column to be written.
    • maybecm: Centimorgan coordinate. Omitted if all loaded values are 0.
    • cm: Force CM column to be written even when empty.
    The default is xheader,maybequal,maybefilter,maybeinfo,maybecm.
  • The first two columns of a .psam file are always #FID/IID. Supported optional .psam column sets are:
    • maybefid: Family ID, '0' = missing. Omitted if all loaded values are missing.
    • fid: Force FID column to be written even when empty.
    • maybesid: Source ID (useful when multiple samples are collected from a single organism), '0' = missing. Omitted if all loaded values are missing.
    • sid: Force SID column to be written even when empty.
    • maybeparents: Father and mother IIDs, '0' = missing. Omitted if all loaded values are missing.
    • parents: Force PAT and MAT columns to be written even when empty.
    • sex: '1'/'M'/'m' = male, '2'/'F'/'f' = female, 'NA'/'0' = missing.
    • pheno1: First active phenotype. If no phenotypes are loaded, all entries are set to the --output-missing-phenotype string.
    • phenos: All active phenotypes, if any. (Can be combined with pheno1 to force at least one phenotype column to be written.)
    The default is maybefid,maybesid,maybeparents,sex,phenos.

--sort-vars {mode}

By default, --make-{b}pgen/--make-bed do not resort the variants, and they'll error out if the input file is not at least sorted by chromosome. (This is a change from PLINK 1.x.) However, if you add --sort-vars, the variants will be resorted by chromosome code, then position, then ID. The following string-comparison modes are supported:

  • 'natural'/'n': Natural sort (default).
  • 'ascii'/'a': ASCII.

Regular chromosomes are sorted (in numeric code order; for humans, PAR1 has an effective numeric code of 22.5, PAR2 23.5) before custom contigs.

--make-just-pvar <zs> <cols=[column set descriptor]>
--make-just-psam <cols=[column set descriptor]>

--make-just-bim <zs>
--make-just-fam

--make-just-pvar is a variant of --make-pgen which only generates a .pvar file, and --make-just-psam plays the same role for .psam files. Similarly, --make-just-bim just generates a .bim file, and --make-just-fam just generates a .fam file. Unlike most other PLINK commands, these do not require genotype data (though you won't have access to many filtering flags when using these in no-genotype mode).

Use these cautiously. It is very easy to desynchronize your binary genotype data and your .pvar/.psam indexes if you use these commands improperly. If you have any doubt, stick with --make-{b}pgen/--make-bed.

Generate text fileset

--export [output format(s)...] <01 | 12> <bgz> <id-delim=[char]> <id-paste=[column set descriptor]> <include-alt> <omit-nonmale-y> <spaces> <vcf-dosage=[field]> <ref-first> <bits=[#]>

--export creates a new text fileset, after sample/variant filters have been applied. The following output formats are currently supported:

  • A: Sample-major additive (0/1/2) coding, suitable for loading from R. Dosages are now supported. Haploid genotypes are coded on a 0-2 scale. If you need uncounted alleles to be named in the header line, add the 'include-alt' modifier.
  • AD: Sample-major additive (0/1/2) + dominant (het=1/hom=0) coding. Also supports dosages and 'include-alt'.
  • A-transpose: Variant-major 0/1/2. Dosages are supported.
  • bgen-1.1: Older Oxford-format .bgen + .sample.
  • bgen-1.2, bgen-1.3: Newer Oxford-format .bgen + .sample.
    Single-part sample IDs are stored in the .bgen; the 'id-paste' modifier controls which .psam columns are used to construct the IDs (choices are maybefid, fid, iid, maybesid, and sid; default is maybefid,iid,maybesid) there, while the 'id-delim' modifier sets the character between the ID pieces (default '_').
    Two-part IDs are written to the .sample file.
    Default probability precision is 16-bit; use the 'bits=' modifier to change this.
  • haps, hapslegend: Oxford-format .haps + .sample{ + .legend}. All data must be biallelic and phased. Add the 'bgz' modifier to block-gzip the .haps file.
  • ind-major-bed: PLINK 1 sample-major .bed (+ .bim + .fam).
  • oxford: Oldest Oxford-format .gen + .sample. Add the 'bgz' modifier to block-gzip the .gen file.
  • vcf, vcf-4.2: VCF (default version 4.3). If PAR1 and PAR2 are present, they are automatically merged with chrX, with proper handling of chromosome codes and male ploidy. If the 'bgz' modifier is added, the VCF file is block-gzipped.
    The 'id-paste' and 'id-delim' modifiers have the usual effect.
    Dosage are not exported unless the 'vcf-dosage=' modifier is present. The following five dosage export modes are supported:
    • 'GP': genotype posterior probabilities (v4.3 only).
    • 'DS': Minimac3-style dosages, omitted for hardcalls.
    • 'DS-force': Minimac3-style dosages, never omit.
    • 'HDS': Minimac4-style phased dosages, omitted for hardcalls and unphased calls. Also includes 'DS' output.
    • 'HDS-force': Always report DS and HDS.

(Use --make-bed + PLINK 1.9 --recode to export other formats for now.)

For example,

plink2 --pfile binary_fileset --export bgen-1.1 --out new_text_fileset

generates new_text_fileset.bgen and new_text_fileset.sample from the data in binary_fileset.pgen + .pvar + .psam, while

plink2 --pfile binary_fileset --recode vcf id-paste=iid --out new_vcf

generates new_vcf.vcf from the same data, removing family IDs in the process.

In addition,

  • The '12' modifier causes ALT1 alleles to be coded as '1' and REF alleles as '2', while '01' maps ALT1 → 0 and REF → 1.
  • The 'spaces' modifier makes the output space-delimited instead of tab-delimited, whenever both are permitted.
  • For biallelic formats where it's unspecified whether the REF/major allele should appear first or second, --export defaults to second for compatibility with PLINK 1.9. Use the 'ref-first' modifier to change this.
Irregular output coding

--output-chr [MT code]

PLINK 1.9 and 2.0 support seven chromosome coding schemes in output files. You can select between them by providing the desired human mitochondrial code:

  • 26: Always numeric. (XY, PAR1, and PAR2 are all assigned the XY numeric code, so this isn't quite a one-to-one mapping.) This was the default in PLINK 1.x.
  • M: Autosomes numeric, X/Y/M single-character, XY/PAR1/PAR2 as usual.
  • MT: Autosomes numeric, X/Y single-character, MT two-character, XY/PAR1/PAR2 as usual. This is the default in PLINK 2.
  • 0M: Autosomes numeric, 0X/0Y/MT two-character, XY/PAR1/PAR2 as usual.
  • chr26: PAR1/PAR2 as usual, other chromosomes are 'chr' followed by a numeric code.
  • chrM: Autosomes are 'chr' followed by a numeric code, X/Y/XY/M are preceded by 'chr', PAR1/PAR2 as usual.
  • chrMT: Autosomes are 'chr' followed by a numeric code, X/Y/XY/MT are preceded by 'chr', PAR1/PAR2 as usual.

PLINK correctly interprets all of these encodings in input files.

--output-missing-genotype [char]
--output-missing-phenotype [string]

--output-missing-genotype allows you to change the character (default '.') used to represent missing genotypes in PLINK output files, while --output-missing-phenotype changes the string (default 'NA') representing missing phenotypes. Note that both of these defaults are different from PLINK 1.x.

Heterozygous haploid errors

--set-hh-missing <keep-dosage>

--set-hh-missing causes heterozygous haploid hardcalls and all female chrY calls to be erased during --make-{b}pgen/--make-bed.

  • Note that the most common source of heterozygous haploid errors is imported data which doesn't follow PLINK's convention for representing the X chromosome pseudo-autosomal region. This should be addressed with --split-par below, not --set-hh-missing.
  • This can no longer be combined with --export.
  • Unknown-sex chrY genotypes are not erased; this is a change from PLINK 1.x.
  • By default, dosages associated with the erased hardcalls are also erased. To keep all dosages instead, add the 'keep-dosage' modifier.
  • If phased haploid dosages are present, the phase information is cleared.

--set-mixed-mt-missing <keep-dosage>

Mitochondrial DNA is subject to heteroplasmy, so PLINK 2 normally saves MT dosages near 0.5 as 'heterozygous' genotypes, and these are not erased by --set-hh-missing. However, some analytical methods don't use these mixed MT genotype calls, and instead assume that they don't exist. The --set-mixed-mt-missing flag can be used with --make-{b}pgen/--make-bed to generate a dataset with mixed MT hardcalls erased.

X chromosome pseudo-autosomal region

--split-par [last bp position of head] [first bp position of tail]
--split-par [build code]
--merge-par

PLINK 2 prefers to represent the X chromosome's pseudo-autosomal region as 'PAR1' and 'PAR2' regions; this removes the need for special handling of male X heterozygous calls. This has a major advantage over PLINK 1.x's 'XY' convention: splitting and remerging no longer require resorting of the variants.

Thus, PLINK 1.9's --split-x flag has been retired in favor of --split-par, which takes the base-pair boundaries of the pseudo-autosomal regions, and treats all chrX variants in those regions as if their chromosome codes were PAR1/PAR2 instead. As (typo-resistant) shorthand, you can pass one of the following build codes to --split-par:

  • 'b36'/'hg18': NCBI build 36/UCSC human genome 18, boundaries 2709521 and 154584237
  • 'b37'/'hg19': GRCh37/UCSC human genome 19, boundaries 2699520 and 154931044
  • 'b38'/'hg38': GRCh38/UCSC human genome 38, boundaries 2781479 and 155701383

--split-par errors out if the dataset already contains a PAR1 or PAR2 region.

Conversely, --merge-par treats all variants in PAR1/PAR2 as if their chromosome code was X.

Note that "--export vcf" has special-case logic for chrX/PAR1/PAR2: chromosome codes are all saved as chrX, but male ploidies are rendered using the PAR1/PAR2 boundaries. It should not be combined with --merge-par.

--merge-x

To import PLINK 1.x-style data with 'XY' codes,

  1. Use --merge-x + --sort-vars + --make-bed, to convert the 'XY' chromosome codes back to 'X' and put the variants back in standard order.
  2. You can then use --split-par to add the new PAR1/PAR2 codes when appropriate.
Update variant information

--set-missing-var-ids [template string]

--set-all-var-ids [template string]
--var-id-multi [template string]
--var-id-multi-nonsnp [template string]

--new-id-max-allele-len [len] <error | missing | truncate>

Whole-exome and whole-genome sequencing results frequently contain variants which have not been assigned standard IDs. If you don't want to throw out all of that data, you'll usually want to assign them chromosome-and-position-based IDs.

--set-missing-var-ids (which just replaces missing IDs) and --set-all-var-ids (which overwrites everything) provide one way to do this. The parameter taken by these flags is a special template string, with a '@' where the chromosome code should go, and a '#' where the base-pair position belongs. (Exactly one @ and one # must be present.) For example, given a .pvar file starting with

#CHROM POS ID REF ALT
chr1 10583 . G A
chr1 886817 . T C
chr1 886817 . C CATTTT

"--set-missing-var-ids @:#[b37]" would name the first variant 'chr1:10583[b37]', the second variant 'chr1:886817[b37]'... and the third variant also gets the name 'chr1:886817[b37]'.

To maintain unique IDs in this situation, you can include '$r'/'$a' in your template string to refer to the REF/first ALT allele. So, if we're using a bash shell, we can try again with

--set-missing-var-ids @:#[b37]\$r,\$a

which would name the first variant 'chr1:10583[b37]G,A', the second variant 'chr1:886817[b37]T,C', and the third variant 'chr1:886817[b37]C,CATTTT'. Note the extra backslashes: they are necessary in bash because '$' is a reserved character there.

(PLINK 1.9's '$1'/'$2' syntax for referring to those two alleles in ASCII-sort order is still supported as well, and it has a place when no reference genome exists. However, we recommend avoiding it most of the time, since it does not distinguish between deletions and insertions in some cases, whereas '$r'/'$a' doesn't have that problem.)

In combination with either flag above, --var-id-multi can be used to specify a special template to use for just multiallelic variants (since it may not make sense to mention the first ALT allele in this case), and --var-id-multi-nonsnp does the same for variants that are both multiallelic and not SNPs (i.e. at least one allele code has length > 1).

Allele names associated with indels are occasionally very, very long, and the synthetic variant ID names which would be generated from such long alleles are very inconvenient to work with. As a result, if any allele codes are longer than 23 characters, PLINK 2 requires you to use --new-id-max-allele-len to explicitly specify how they should be handled. Its first parameter is a length threshold, and its optional second parameter specifies how allele codes longer than the length threshold should be handled (default is now 'error'; 'missing' causes such variants to be assigned the unnamed-variant ID, while 'truncate' does what it sounds like and is a bit dangerous).

--missing-var-code [missing ID string]

'.' is the default missing-variant-ID code. You can use --missing-var-code to change this; e.g. "--missing-var-code NA" would be appropriate for a .pvar file starting with

#CHROM POS ID REF ALT
chr1 10583 NA G A
chr1 886817 NA T C
chr1 886817 NA C CATTTT

--update-chr [filename] {chr col. number} {variant ID col.} {skip}
--update-cm [filename] {cm col. number} {variant ID col.} {skip}
--update-name [filename] {new ID col. number} {old ID col.} {skip}
--update-map [filename] {bp col. number} {variant ID col.} {skip}
--update-alleles [filename]
--allele1234 <multichar>
--alleleACGT <multichar>

--update-chr, --update-cm, --update-map, and --update-name update variant chromosomes, centimorgan positions, base-pair positions, and IDs, respectively. By default, the new value is read from column 2 and the (old) variant ID from column 1, but you can adjust these positions with the second and third parameters. The optional fourth 'skip' parameter is either a nonnegative integer, in which case it indicates the number of lines to skip at the top of the file, or a single nonnumeric character, which causes each line with that leading character to be skipped. (Note that, if you want to specify '#' as the skip character, you need to surround it with single- or double-quotes in some Unix shells.)

Strictly speaking, you can use Unix tail, cut, paste, and/or sed to perform the same job (albeit with more time and hassle) as the three optional parameters we have introduced. If you have not used these Unix commands before, we recommend that you familiarize yourself with what they do because they are still likely to come in handy in other scenarios.

You can combine --update-chr, --update-cm, and/or --update-map in the same run. (However, to avoid confusion regarding whether old or new variant IDs apply, we force --update-name to be run separately.)

When invoking --update-chr, you must use --make-bed/--make-{b}pgen in the same run, and no other output commands. Otherwise, we still recommend that you use --make-bed/--make-{b}pgen once instead of --update-... over and over, but it's not absolutely required.

--update-alleles updates variant allele codes. Its input should have the following five fields:

  1. Variant ID
  2. One of the old allele codes
  3. The other old allele code
  4. New code for the first named allele
  5. New code for the second named allele

Note that, if you just want to change REF/ALT allele assignments in the .pvar/.bim files without changing the real genotype data, you must use a flag like --ref-allele instead.

--allele1234 interprets and/or recodes A/C/G/T alleles in the input as 1/2/3/4, while --alleleACGT does the reverse. With the 'multichar' modifier, these will translate multi-character alleles as well, e.g. '--allele1234 multichar' converts 'TT' to '44'.

Update sample information

--update-ids [filename]
--update-parents [filename]

--update-sex [filename] <col-num=[n]> <male0>

These update sample IDs, parental codes, and sexes, respectively. --update-parents also updates founder/nonfounder status in the current run when appropriate.

--update-ids expects input with the following four fields:

  1. Old family ID
  2. Old within-family ID
  3. New family ID
  4. New within-family ID

--update-parents expects the following four fields:

  1. Family ID
  2. Within-family ID
  3. New paternal within-family ID
  4. New maternal within-family ID

--update-sex expects a file with sample IDs in front, and a sex information column.

  • If there is a recognized header line (starting with '#FID' or '#IID'), it defaults to loading sex information from the first column titled 'SEX' (any capitalization); otherwise it assumes the 3rd column. To force a specific column number, use the 'col-num=' modifier.
  • Only the first character in the sex column is processed. By default, '1'/'M'/'m' is interpreted as male, '2'/'F'/'f' is interpreted as female, and '0'/'N' is interpreted as unknown-sex. To change this to '0'/'M'/'m' = male, '1'/'F'/'f' = female, anything else other than '2' = unknown-sex, add the 'male0' modifier.

--update-ids cannot be used in the same run as --update-parents or --update-sex.

Set REF/ALT alleles

--ref-allele <force> [filename] {REF col. number} {variant ID col.} {skip}
--alt1-allele <force> [filename] {ALT1 col. number} {variant ID col.} {skip}

--ref-from-fa <force>

--ref-allele sets all alleles specified in the file to REF, while --alt1-allele does the same for the first ALT allele. Column and skip parameters work the same way as with --update-chr and friends.

In combination with a FASTA file, --ref-from-fa sets REF alleles when it can be done unambiguously. (Note that this is never possible for deletions and some insertions.)

  • These can only be used in runs with --make-bed/--make-{b}pgen/--export and no other commands.
  • "--ref-allele [VCF filename] 4 3 '#'", which scrapes reference allele assignments from a VCF file, is especially useful.
  • By default, these error out when asked to change a 'known' reference allele. Add the 'force' modifier to permit that (when e.g. switching to a new reference genome).
  • When --alt1-allele changes the previous REF allele to ALT1, the previous ALT1 allele is set to REF and marked as provisional. All other REF allele assignments made by these flags are marked as 'known'.

--maj-ref <force>

--maj-ref sets major alleles to REF, like PLINK 1.x automatically did. (This is now opt-in instead of opt-out; --keep-allele-order is no longer necessary to prevent allele-swapping.)

  • This can only be used in runs with --make-bed/--make-{b}pgen/--export and no other commands.
  • By default, this only affects variants marked as having 'provisional' reference alleles. Add 'force' to apply this to all variants.
  • All REF allele assignments made by --maj-ref are marked as provisional.

--real-ref-alleles

When a PLINK 1 fileset is loaded, PLINK 2 normally treats its A2 alleles as provisional-REF. Use --real-ref-alleles to specify that they're from a real reference genome.

Left-normalization

--normalize <list>
  (alias: --norm)

In combination with a FASTA file, --normalize tries to left-normalize all variants, using the algorithm described in Tan A, Abecasis GR, Kang HM (2015) Unified representation of genetic variants. It currently assumes no differences in capitalization between the FASTA and the allele codes, and skips variants with one or more symbolic alleles (starting with '<').

The 'list' modifier causes the IDs of all modified variants to be written to plink2.normalized.

Note that left-normalization has a "blind spot" when it comes to non-tandem-repeat deletions of differing lengths ending at the same position: they won't end up in the same multiallelic variant after split + left-normalize + join. Consider handling this case separately.

Sort by FID/IID

--indiv-sort [mode name] {filename}

This allows you to specify how samples should be sorted when generating new datasets. The four modes are:

  • 'none'/'0': Stick to the order the samples were loaded in. This is the PLINK default for all operations except merges.
  • 'natural'/'n': 'Natural sort' of family and within-family IDs, similar to the logic used in OS X and Windows file selection dialogs; e.g. 'id2' < 'ID3' < 'id10'. This is the PLINK 2 default when merging datasets.
  • 'ascii'/'a': Sort in ASCII order, e.g. 'ID3' < 'id10' < 'id2'. This may be more appropriate than natural sort if you need an ordering that's trivial to regenerate in other software, or if your IDs mix letters and digits in a random and meaningless fashion.
  • 'file'/'f': Use the order in another file (named in the second parameter). The file should be space/tab-delimited, family IDs should be in the first column, and within-family IDs should be in the second column.
Covariate files

--write-covar <cols=[column set descriptor]>

If covariates are defined, an updated version (with all filters applied) is automatically written to plink2.cov whenever --make-pgen, --make-just-psam, --export, or a similar command is present. However, if you do not wish to simultaneously generate a new sample file, you can use --write-covar to just produce a pruned covariate file.

The first two columns of a PLINK 2 .cov file are always #FID/IID. Supported optional column sets are:

  • maybefid: FID, if the column was present in the input.
  • fid: Force FID column to be written when absent from input.
  • maybesid: SID, if the column was present in the input.
  • sid: Force SID column to be written when absent from input.
  • maybeparents: Father and mother IIDs, '0' = missing. Omitted if all loaded values are missing.
  • parents: Force PAT and MAT columns to be written even when empty.
  • sex: '1'/'M'/'m' = male, '2'/'F'/'f' = female, 'NA'/'0' = missing.
  • pheno1: First active phenotype. If no phenotypes are loaded, all entries are set to the --output-missing-phenotype string.
  • phenos: All active phenotypes, if any. (Can be combined with pheno1 to force at least one phenotype column to be written.)
  • (Covariates are always present, and positioned here.)

The default is maybefid,maybesid.

Phenotype/covariate transformations

--variance-standardize {phenotype/covariate name(s)...}
--covar-variance-standardize {covariate name(s)...}

--variance-standardize linearly transforms named quantitative phenotypes and covariates to mean-zero, variance 1. If no parameters are provided, all quantitative phenotypes and covariates are affected. --covar-variance-standardize does the same for just quantitative covariates.

--quantile-normalize {phenotype/covariate name(s)...}
--pheno-quantile-normalize {phenotype name(s)...}
--covar-quantile-normalize {covariate name(s)...}

--quantile-normalize forces named quantitative phenotypes and covariates to a N(0, 1) distribution, preserving only the original rank orders; if no parameters are provided, all quantitative phenotypes and covariates are affected. --pheno-quantile-normalize does the same for just quantitative phenotypes, while --covar-quantile-normalize does this for just quantitative covariates.

--split-cat-pheno <omit-last> <covar-01> {categorical phenotype/covariate name(s)...}

--split-cat-pheno splits n-category phenotype(s) into n (or n-1, with the 'omit-last' modifier) binary phenotypes, with names of the form '[original phenotype name]=[category name]'. (As a consequence, affected phenotypes and categories are not permitted to contain the '=' character.)

  • This happens after all sample filters.
  • If no phenotype or covariate names are provided, all categorical phenotypes (but not covariates) are processed by --split-cat-pheno.
  • By default, generated covariates are coded as 1=false, 2=true. To code them as 0=false, 1=true instead, add the 'covar-01' modifier.
Sample/variant filtering results

--write-samples <noheader>

--write-snplist <zs>

--write-samples writes IDs of all samples which pass the filters and inclusion thresholds you've specified to plink2.id, while --write-snplist does the same for variants (output filename plink2.snplist).

By default, --write-samples includes a header line in the output file; you can remove it with the 'noheader' modifier.

(The next several pages of documentation are under development.)

Resources >>