Non-Parametric Bayesian Modelling of Digital Gene Expression Data

Dimitrios V Vavoulis; Julian Gough

Non-Parametric Bayesian Modelling of Digital Gene Expression Data

Abstract

Dimitrios V Vavoulis and Julian Gough

Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or “reads”, which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues. Source code implementing the methodology presented in this paper takes the form of the Python Package DGEclust, which is freely available at the following link: https://bitbucket.org/DimitrisVavoulis/dgeclust.

Haftungsausschluss: Dieser Abstract wurde mit Hilfe von Künstlicher Intelligenz übersetzt und wurde noch nicht überprüft oder verifiziert

Teile diesen Artikel

Zeitschriften-Highlights

Indiziert in

CAS-Quellenindex (CASSI)
Index Copernicus
Google Scholar
Sherpa Romeo
Datenbank für wissenschaftliche Zeitschriften
Genamics JournalSeek
JournalTOCs
CiteFactor
Elektronische Zeitschriftenbibliothek
RefSeek
Hamdard-Universität
EBSCO A-Z
Verzeichnis der Abstract-Indexierung für Zeitschriften
Weltkatalog wissenschaftlicher Zeitschriften
OCLC – WorldCat
Gelehrtersteer
SWB Online-Katalog
Virtuelle Bibliothek für Biologie (vifabio)
Publons
Dtu finde es
Genfer Stiftung für medizinische Ausbildung und Forschung

Zeitschrift für Informatik und Systembiologie