当前位置:科学网首页 > 小柯机器人 >详情
中脑边缘多巴胺调节从动作中学习的速度
作者:小柯机器人 发布时间:2023/1/28 8:37:28


美国霍华德休斯医学研究所Joshua T. Dudman等研究人员合作发现,中脑边缘多巴胺调节从动作中学习的速度。相关论文于2023年1月18日在线发表于国际学术期刊《自然》。

研究人员人员以一个全面的口面和身体运动数据集为主题,以了解初始的、头部受限的小鼠学习微量条件反射范式时,它们的行为政策是如何演变的。初始多巴胺能奖励反应的个体差异与习得的行为策略的出现相关,但与预测线索的假定值编码的出现无关。同样地,对中脑边缘多巴胺的生理校准操作产生了一些与价值学习不一致的效应,但由基于神经网络的模型预测(该模型以多巴胺信号为主题),其为行为策略学习设置自适应率,而不是错误信号。这项工作提供了强有力的证据,表明阶段性多巴胺活动可以调节行为政策的直接学习,并扩大了强化学习模型对动物学习的解释能力。

研究人员表示,最近在训练人工智能体和机器人方面取得的成功,源于对行为策略的直接学习和通过价值函数进行的间接学习的结合。政策学习和价值学习分别采用不同的算法来优化行为表现和奖励预测。在动物中,行为学习和中脑边缘多巴胺信号的作用在奖励预测方面得到了广泛的评估;然而,到目前为止,很少有人考虑到直接的政策学习可能如何促进它们的理解。

附:英文原文

Title: Mesolimbic dopamine adapts the rate of learning from action

Author: Coddington, Luke T., Lindo, Sarah E., Dudman, Joshua T.

Issue&Volume: 2023-01-18

Abstract: Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions1,2,3. Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction4; however, so far there has been little consideration of how direct policy learning might inform our understanding5. Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning6.

DOI: 10.1038/s41586-022-05614-z

Source: https://www.nature.com/articles/s41586-022-05614-z

期刊信息

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html