当前位置:科学网首页 > 小柯机器人 >详情
科学家开发基于表观遗传模式识别和目标验证的增强子监督式预测模型
作者:小柯机器人 发布时间:2020/8/1 23:42:22

美国耶鲁大学Mark Gerstein研究小组取得一项新突破。他们研制了具有表观遗传模式识别预测和有针对性的验证的增强子预测模型。这一研究成果发表在2020年7月29日出版的国际学术期刊《自然—方法学》上。

研究小组开发了一个框架用于使用果蝇STARR-seq技术创建基于表观遗传特性的形态匹配滤波。该团队将这些特征与监督式机器学习算法集成在一起,用于预测增强子。小组进一步证明他们的模型可以用于预测哺乳动物的增强子。

使用体内和体外方法的结合,研究人员全面验证了该预测模型,包括在小鼠中转基因的化验和在人类细胞系中基于转化的报告子化验(共153个增强子)。结果证实,他们的模型可以准确预测不同物种中的增强子,而无需重新参数化。

最后,该课题组检查了预测的增强子与启动子之间的转录因子结合模式。研究组证明了这些模式可用于实现二级模型的建设,从而有效区分增强子和启动子。通过果蝇表观遗传信息和STARR-seq数据训练的有监督的机器学习模型可用于预测小鼠和人类增强子。

据了解,增强子是重要的非编码元素,但他们一直很难用实验表征。大规模平行检测试验技术的发展首次允许表征大量增强子。

附:英文原文

Title: Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

Author: Anurag Sethi, Mengting Gu, Emrah Gumusgoz, Landon Chan, Koon-Kiu Yan, Joel Rozowsky, Iros Barozzi, Veena Afzal, Jennifer A. Akiyama, Ingrid Plajzer-Frick, Chengfei Yan, Catherine S. Novak, Momoe Kato, Tyler H. Garvin, Quan Pham, Anne Harrington, Brandon J. Mannion, Elizabeth A. Lee, Yoko Fukuda-Yuzawa, Axel Visel, Diane E. Dickel, Kevin Y. Yip, Richard Sutton, Len A. Pennacchio, Mark Gerstein

Issue&Volume: 2020-07-29

Abstract: Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters. Supervised machine-learning models trained using Drosophila epigenetic and STARR-seq data can be transferred to predict mouse and human enhancers.

DOI: 10.1038/s41592-020-0907-8

Source: https://www.nature.com/articles/s41592-020-0907-8

期刊信息

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:28.467
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex