DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions

Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional, and non-randomized controlled studies such as the HapMap, the ENCODE, or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/non-informative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2,400 simulated data sets, I/NI value thresholds of 0.025, 0.05, and 0.1 yielded average specificities of 92%, 97%, and 99% at sensitivities of 76%, 61%, and 38% respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants, or eQTLs.

Please cite:

Günter Klambauer, Thomas Unterthiner, and Sepp Hochreiter. "DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions." Nucleid Acids Research 41(21), e198-e198, 2013 doi:10.1093/nar/gkt834.


Supplementary Notes:


Official Link & DOI (online soon):

Download the R-package

Datasets and R scripts:
The benchmarking data sets used in our publication can be downloaded below.