Inference and Sample Size Calculations Based on Statistical Tests in a
Negative Binomial Distribution for Differential Gene Expression in RNAseq
Data

Xiaohong Li; Nigel GF Cooper; Yu Shyr; Dongfeng Wu; Eric C Rouchka; Ryan S Gill; Timothy E O’Toole; Guy N Brock; Shesh N Rai

Inference and Sample Size Calculations Based on Statistical Tests in a Negative Binomial Distribution for Differential Gene Expression in RNAseq Data

Abstract

Xiaohong Li, Nigel GF Cooper, Yu Shyr, Dongfeng Wu, Eric C Rouchka, Ryan S Gill, Timothy E O’Toole, Guy N Brock and Shesh N Rai

The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by Li et al. in 2013. In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design.

Haftungsausschluss: Dieser Abstract wurde mit Hilfe von Künstlicher Intelligenz übersetzt und wurde noch nicht überprüft oder verifiziert

Teile diesen Artikel

Zeitschriften-Highlights

Indiziert in

Index Copernicus
Google Scholar
Sherpa Romeo
Datenbank für wissenschaftliche Zeitschriften
Öffnen Sie das J-Tor
Genamics JournalSeek
Akademische Schlüssel
JournalTOCs
Forschungsbibel
Nationale Wissensinfrastruktur Chinas (CNKI)
Ulrichs Zeitschriftenverzeichnis
Zugang zu globaler Online-Forschung in der Landwirtschaft (AGORA)
Elektronische Zeitschriftenbibliothek
RefSeek
Hamdard-Universität
EBSCO A-Z
Verzeichnis der Abstract-Indexierung für Zeitschriften
OCLC – WorldCat
SWB Online-Katalog
Virtuelle Bibliothek für Biologie (vifabio)
Publons
Euro-Pub

Zeitschrift für Biometrie und Biostatistik