We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data.
- Authors
Vavoulis, Dimitrios V.; Taylor, Jenny C.; Schuh, Anna
- Abstract
Motivation: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. Results: We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative- Binomial-based) models for eQTL mapping.
- Subjects
GENE expression; NUCLEOTIDE sequencing; RNA sequencing; SKEWNESS (Probability theory); DATA transformations (Statistics)
- Publication
Bioinformatics, 2017, Vol 33, Issue 19, p3058
- ISSN
1367-4803
- Publication type
Article
- DOI
10.1093/bioinformatics/btx355