当前位置:科学网首页 > 小柯机器人 >详情
单细胞转录组学的大规模基础模型
作者:小柯机器人 发布时间:2024/6/9 22:44:06

清华大学张学工等近期取得重要工作进展。他们研究提出了单细胞转录组学的大规模基础模型。相关研究成果2024年6月6日在线发表于《自然—方法学》杂志上。

据介绍,大型预训练模型已成为自然语言处理及相关领域取得突破的基础模型。开发用于破译细胞“语言”和促进生物医学研究的基础模型是有希望的,但具有挑战性。

研究人员开发了一个大型预训练模型scFoundation,也称为“xTrimossFoundationα”,具有1亿个参数,覆盖约20000个基因,在超过5000万个人类单细胞转录组谱上进行了预训练。从可训练参数的大小、基因的维度和训练数据的体积来看,scFoundation是一个大规模的模型。其不对称的变压器式架构和预训练任务设计能够有效捕捉各种细胞类型和状态下基因之间的复杂背景关系。

实验表明,scFoundation是一种基础模型,在基因表达增强、组织药物反应预测、单细胞药物反应分类、单细胞扰动预测、细胞类型注释和基因模块推断等一系列单细胞分析任务中取得了最先进的性能。

附:英文原文

Title: Large-scale foundation model on single-cell transcriptomics

Author: Hao, Minsheng, Gong, Jing, Zeng, Xin, Liu, Chiming, Guo, Yucheng, Cheng, Xingyi, Wang, Taifeng, Ma, Jianzhu, Zhang, Xuegong, Song, Le

Issue&Volume: 2024-06-06

Abstract: Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the ‘languages’ of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named ‘xTrimoscFoundationα’, with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

DOI: 10.1038/s41592-024-02305-7

Source: https://www.nature.com/articles/s41592-024-02305-7

期刊信息

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex