当前位置:科学网首页 > 小柯机器人 >详情
高维临床数据的无监督表征学习改善了基因组的发现和预测
作者:小柯机器人 发布时间:2024/7/12 14:17:40

美国谷歌研究院Farhad Hormozdiari、Cory Y. McLean和Taedong Yun研究团队近日取得一项新成果。他们的研究显示,高维临床数据的无监督表征学习改善了基因组的发现和预测。2024年7月8日出版的《自然—遗传学》发表了这项成果。

据介绍,尽管高维临床数据(HDCD)越来越多地出现在生物库规模的数据集中,它们在基因发现中的应用仍然具有挑战性。

研究人员介绍了一种无监督深度学习模型——低维嵌入遗传发现的表示学习(REGLE),用于发现遗传变异和高维临床数据之间的关联。REGLE利用变分自编码器计算高维临床数据的非线性解耦嵌入,这些嵌入成为全基因组关联研究(GWAS)的输入。REGLE可以发现现有专家定义的特征未捕获的特征,并能够在很少标记数据的数据集中,创建准确的疾病特异性多基因风险评分(PRSs)。

研究人员应用REGLE对呼吸和循环系统高维临床数据——测量肺功能的螺旋图和测量血容量变化的光容积描记图进行GWAS。REGLE复制已知的基因位点,同时识别以前未检测到的其他基因位点。REGLE可预测总生存期,由REGLE基因位点构建PRS可改善多个生物库的疾病预测。总的来说,REGLE包含的临床相关信息超出了现有专家定义的特征所捕获的信息,从而改进了基因发现和疾病预测。

附:英文原文

Title: Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction

Author: Yun, Taedong, Cosentino, Justin, Behsaz, Babak, McCaw, Zachary R., Hill, Davin, Luben, Robert, Lai, Dongbing, Bates, John, Yang, Howard, Schwantes-An, Tae-Hwi, Zhou, Yuchen, Khawaja, Anthony P., Carroll, Andrew, Hobbs, Brian D., Cho, Michael H., McLean, Cory Y., Hormozdiari, Farhad

Issue&Volume: 2024-07-08

Abstract: Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD—spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.

DOI: 10.1038/s41588-024-01831-6

Source: https://www.nature.com/articles/s41588-024-01831-6

期刊信息

Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:41.307
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex