当前位置:科学网首页 > 小柯机器人 >详情
研究用对比学习揭示发表偏倚与化学反应性的关系
作者:小柯机器人 发布时间:2025/3/3 14:37:30

近日,美国麻省理工学院教授Connor W. Coley及其团队的研究发现,用对比学习揭示发表偏倚与化学反应性的关系。2025年3月2日出版的《美国化学会杂志》发表了这项成果。

在这项工作中,课题组研究人员更深入地探讨了这种发表偏倚与化学反应性之间的关系,而不是简单地分析了产率分布,并提出了一种新的神经网络训练策略,即底物范围对比学习。通过将已报告的底物作为阳性样本,将未报告的底物作为阴性样本,他们的对比学习策略教会了一个模型根据已公布的底物范围表的历史趋势,在数字嵌入空间内对分子进行分组。

对CAS Content CollectionTM中的20798种芳基卤化物(涵盖2010年至2015年的出版物)进行训练,研究小组通过直观的可视化和定量回归分析证明,学习到的嵌入与物理有机反应性描述符具有相关性。此外,这些嵌入适用于各种反应建模任务,如产量预测和区域选择性预测,强调了将历史反应数据作为预训练任务的潜力。这项工作不仅提出了一种化学特定的机器学习训练策略,以一种新的方式从文献数据中学习,而且还代表了一种独特的方法,来揭示由出版物中底物选择趋势所反映的化学反应性趋势。

据介绍,合成方法的底物耐受性和通用性通常在“衬底范围”表中显示。然而,底物选择经常表现出公开的发表偏倚:不成功的实验或低产量的结果很少被报道。

附:英文原文

Title: Revealing the Relationship between Publication Bias and Chemical Reactivity with Contrastive Learning

Author: Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley

Issue&Volume: March 2, 2025

Abstract: A synthetic method’s substrate tolerance and generality are often showcased in a “substrate scope” table. However, substrate selection exhibits a frequently discussed publication bias: unsuccessful experiments or low-yielding results are rarely reported. In this work, we explore more deeply the relationship between such a publication bias and chemical reactivity beyond the simple analysis of yield distributions using a novel neural network training strategy, substrate scope contrastive learning. By treating reported substrates as positive samples and nonreported substrates as negative samples, our contrastive learning strategy teaches a model to group molecules within a numerical embedding space, based on historical trends in published substrate scope tables. Training on 20,798 aryl halides in the CAS Content CollectionTM, spanning thousands of publications from 2010 to 2015, we demonstrate that the learned embeddings exhibit a correlation with physical organic reactivity descriptors through both intuitive visualizations and quantitative regression analyses. Additionally, these embeddings are applicable to various reaction modeling tasks like yield prediction and regioselectivity prediction, underscoring the potential to use historical reaction data as a pretraining task. This work not only presents a chemistry-specific machine learning training strategy to learn from literature data in a new way but also represents a unique approach to uncover trends in chemical reactivity reflected by trends in substrate selection in publications.

DOI: 10.1021/jacs.5c01120

Source: https://pubs.acs.org/doi/abs/10.1021/jacs.5c01120

 

期刊信息

JACS:《美国化学会志》,创刊于1879年。隶属于美国化学会,最新IF:16.383
官方网址:https://pubs.acs.org/journal/jacsat
投稿链接:https://acsparagonplus.acs.org/psweb/loginForm?code=1000