当前位置:科学网首页 > 小柯机器人 >详情
科学家利用高效数据挖掘优化蛋白质筛选方法
作者:小柯机器人 发布时间:2021/4/8 16:43:49

美国哈佛大学George M. Church研究团队利用高效数据挖掘研发出低氮蛋白工程方法。该项研究成果发表在2021年4月7日出版的《自然-方法学》上。

研究人员介绍了一种基于机器学习的算法,该算法可以使用多达24个经过功能分析的突变体序列来构建精准的虚拟环境,并通过计算机定向进化筛选一千万个序列。正如对维多利亚水母的GFP(avGFP)和大肠杆菌TEM-1β-内酰胺酶这两种不同的蛋白质进行测试,通过单轮筛选的最优候选物是多样的,并且与先前的高通量研究中所获得的工程突变体一样活跃。

通过从天然蛋白质序列图谱中提取信息,该模型学习了“非自然性”的潜在表示形式,这有助于引导检索远离非功能性序列的邻域。然后,利用低N筛选对所感兴趣的对象进行改进。

总而言之,该算法可在不牺牲通量的情况下有效利用资源密集型数据进行高保真测定,并有助于加速工程蛋白应用于发酵罐、农业和临床。

据了解,蛋白质工程具有巨大的学术和工业应用潜力。然而,由于缺乏与设计目标相符的实验方法以及高通量方法来发现稀有的、强化变体而受到局限。

附:英文原文

Title: Low- N protein engineering with data-efficient deep learning

Author: Surojit Biswas, Grigory Khimulya, Ethan C. Alley, Kevin M. Esvelt, George M. Church

Issue&Volume: 2021-04-07

Abstract: Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution. As demonstrated in two dissimilar proteins, GFP from Aequorea victoria (avGFP) and E. coli strain TEM-1 β-lactamase, top candidates from a single round are diverse and as active as engineered mutants obtained from previous high-throughput efforts. By distilling information from natural protein sequence landscapes, our model learns a latent representation of ‘unnaturalness’, which helps to guide search away from nonfunctional sequence neighborhoods. Subsequent low-N supervision then identifies improvements to the activity of interest. In sum, our approach enables efficient use of resource-intensive high-fidelity assays without sacrificing throughput, and helps to accelerate engineered proteins into the fermenter, field and clinic.

DOI: 10.1038/s41592-021-01100-y

Source: https://www.nature.com/articles/s41592-021-01100-y

期刊信息

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:28.467
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex