来源:Genome Biology 发布时间:2020/4/15 10:54:54
选择字号:
利用生物和统计协变量提高数据分析统计效率 | Genome Biology

论文标题:Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing

期刊:Genome Biology

作者:Jinyan Huang, Ling Bai et al.

发表时间:2020/04/06

DOI:10.1186/s13059-020-02001-7

微信链接:点击此处阅读微信文章

上海交通大学医学院附属瑞金医院黄金艳研究员与梅奥医学中心陈军研究员合作,使用模拟和真实的表观基因组关联分析数据集,使用数据集对应的协变量,评估了五种FDR控制方法的性能。该研究开发了一个综合测试来评估协变量的有效性,分析发现统计协变量通常比生物变量能更好的控制假阳性率,并于近日发表在开放获取期刊Genome Biology 上。欢迎点击文末“阅读原文”获取论文原文。

高通量技术产生海量数据,大大推进了疾病研究的进展,但如何控制对应的生物数据统计检验的假阳性率,提高统计效能一直没有得到很好的解决。比如在表观基因组关联分析(Epigenome-wide association study,EWAS)研究中,需要在表观基因组层面比较不同表型之间的差异,研究通过检测整个基因组成千上万特异DNA核苷酸上甲基的分布情况,鉴别出和表型相关的表观变化。该计算过程中,就涉及成千上万次统计假设检验。假阳性率(False Discovery Rate,FDR)控制广泛应用于校正EWAS假设检验的P值。然而,传统的FDR控制方法,由于其产生时并非针对生物医学大数据的分析,不使用辅助协变量,很有可能不够充分有效地挖掘其中的价值。上海交通大学医学院附属瑞金医院黄金艳研究员与梅奥医学中心陈军研究员合作,使用模拟和真实的EWAS数据集,使用数据集对应的协变量,评估了五种FDR控制方法的性能,这五种方法分别为:Adaptive p value thresholding (AdaPT), Boca and Leek’s FDR regression (BL), covariate adaptive multiple testing (CAMT), FDR regression (FDRreg)和Independent Hypothesis Weighting (IHW)。

研究中,开发了一个综合测试来评估协变量的有效性,分析发现统计协变量通常比生物变量能更好的控制假阳性率。甲基化平均值和方差对应的协变量,几乎在各个数据集的分析中,都能取得好的效果。而生物协变量只在某些特定的数据集能有好的作用。研究证明了独立假设加权(Independent Hypothesis Weighting,IHW)和协变量自适应多重测试(Covariate Adaptive Multiple Testing,CAMT)方法总体上更强大,特别是对于稀疏信号,与实际数据集相比,可以将检测能力提高25%和68%。在更大的模拟和实际数据集中,该结论得到进一步的证实。

摘要:

Background

Epigenome-wide association studies (EWAS), which seek the association between epigenetic marks and an outcome or exposure, involve multiple hypothesis testing. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not use auxiliary covariates, and they could be less powerful if the covariates could inform the likelihood of the null hypothesis. Recently, many covariate-adaptive FDR control methods have been developed, but application of these methods to EWAS data has not yet been explored. It is not clear whether these methods can significantly improve detection power, and if so, which covariates are more relevant for EWAS data.

Results

In this study, we evaluate the performance of five covariate-adaptive FDR control methods with EWAS-related covariates using simulated as well as real EWAS datasets. We develop an omnibus test to assess the informativeness of the covariates. We find that statistical covariates are generally more informative than biological covariates, and the covariates of methylation mean and variance are almost universally informative. In contrast, the informativeness of biological covariates depends on specific datasets. We show that the independent hypothesis weighting (IHW) and covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, and could improve the detection power by a median of 25% and 68% on real datasets, compared to the ST procedure. We further validate the findings in various biological contexts.

Conclusions

Covariate-adaptive FDR control methods with informative covariates can significantly increase the detection power for EWAS. For sparse signals, IHW and CAMT are recommended.

(来源:科学网)

 
 
 
特别声明:本文转载仅仅是出于传播信息的需要,并不意味着代表本网站观点或证实其内容的真实性;如其他媒体、网站或个人从本网站转载使用,须保留本网站注明的“来源”,并自负版权等法律责任;作者如果不希望被转载或者联系转载稿费等事宜,请与我们接洽。
 
 打印  发E-mail给: 
    
 
相关新闻 相关论文

图片新闻
珠穆朗玛峰因何成为世界最高 极目卫星团队在伽马暴研究中取得重要进展
实践十九号卫星成功发射 他们的15年“铸剑”之路
>>更多
 
一周新闻排行 一周新闻评论排行
 
编辑部推荐博文