genome music smg

VERSION

This document describes genome music smg version 0.04 (2013-05-14 at 16:03:04)

SYNOPSIS

genome music smg --gene-mr-file=? --output-file=? [--max-fdr=?] [--skip-low-mr-genes] [--bmr-modifier-file=?] [--processors=?]

 ... music smg \
      --gene-mr-file output_dir/gene_mrs \
      --output-file output_dir/smgs

(A \*(L"gene-mr-file\*(R" can be generated using the tool \*(L"music bmr calc-bmr\*(R".)

REQUIRED ARGUMENTS

gene-mr-file Text

File with per-gene mutation rates (Created using \*(L"music bmr calc-bmr\*(R")

output-file Text

Output file that will list significantly mutated genes and their p-values

OPTIONAL ARGUMENTS

max-fdr Number

The maximum allowed false discovery rate for a gene to be considered an \s-1SMG\s0 Default value '0.2' if not specified

skip-low-mr-genes Boolean

Skip testing genes with MRs lower than the background \s-1MR\s0 Default value 'true' if not specified

bmr-modifier-file Text

Tab delimited multipliers per gene that modify \s-1BMR\s0 before testing [gene_name bmr_modifier]

processors Integer

Number of processors to use (requires 'foreach' and 'doMC' R packages) Default value '1' if not specified

DESCRIPTION

This script runs R-based statistical tools to identify Significantly Mutated Genes (SMGs), when given per-gene mutation rates categorized by mutation type, and the overall background mutation rates (BMRs) for each of those categories (gene_mr_file, created using \*(L"music bmr calc-bmr\*(R").

P-values and false discovery rates (FDRs) for each gene in gene_mr_file is calculated using three tests: Fisher's Combined P-value test (\s-1FCPT\s0), Likelihood Ratio test (\s-1LRT\s0), and the Convolution test (\s-1CT\s0). For a gene, if its \s-1FDR\s0 for at least 2 of these tests is <= max_fdr, it will be output as an \s-1SMG\s0. Another output file with prefix \*(L"_detailed\*(R" will have p-values and FDRs for all genes.

ARGUMENTS

--bmr-modifier-file
The user can provide a \s-1BMR\s0 modifier for each gene in the \s-1ROI\s0 file, which is a multiplier for the categorized background mutation rates, before testing them against the gene's categorized mutation rates. Such a file can be used to correct for regional or systematic bias in mutation rates across the genome that may be correlated to CpG deamination or \s-1DNA\s0 repair processes like transcription-coupled repair or mismatch repair. Mutation rates have also been associated with \s-1DNA\s0 replication timing, where higher mutation rates are seen in late replicating regions. Note that the same per-gene multiplier is used on each mutation category of \s-1BMR\s0. Any genes from the \s-1ROI\s0 file that are not in the \s-1BMR\s0 modifier file will be tested against unmodified overall BMRs per mutation category. \s-1BMR\s0 modifiers of <=0 are not permitted, because that's just silly.
--skip-low-mr-genes
Genes with consistently lower MRs than the BMRs across mutation categories, may show up in the results as an \s-1SMG\s0 (by \s-1CT\s0 or \s-1LRT\s0). If such genes are not of interest, they may be assigned a p-value of 1. This should also speed things up. Genes with higher Indel or Truncation rates than the background will not be skipped even if the gene's overall \s-1MR\s0 is lower than the \s-1BMR\s0. If bmr-modifiers are applied, this step uses the modified BMRs instead.

AUTHORS

Qunyuan Zhang, Ph.D. Cyriac Kandoth, Ph.D. Nathan D. Dees, Ph.D.