科学家成功解释神经标度定律—小柯机器人

首页 | 新闻 | 博客 | 院士 | 人才 | 会议 | 基金·项目 | 大学 | 国际 | 论文 | 视频·直播 | 小柯机器人

科学家成功解释神经标度定律

作者：小柯机器人发布时间：2024/6/29 15:15:34

近日，谷歌DeepMind的Yasaman Bahri及其研究团队取得一项新进展。经过不懈努力，他们成功解释神经标度定律。相关研究成果已于2024年6月24日在国际知名学术期刊《美国科学院院刊》上发表。

该研究团队提出了一套理论框架，以解释不同标度定律的起源与相互关系。他们深入分析了数据集和模型大小在方差限制与分辨率限制下的标度行为，并归纳出四种标度机制。其中，方差限制的标度源于理想化的无限数据或无限宽度条件下的性能表现，而分辨率限制机制则基于模型对平滑数据流形的有效解析能力。在大宽度限制下，这种机制可以等效于从某些核的谱中推导出来。

研究人员还提出了有力证据，表明大宽度和大数据集分辨率限制的标度指数之间存在对偶关系。为验证这些机制的有效性，他们在大型随机特征和预训练模型的控制环境中进行了全面展示，并在一系列标准架构和数据集上进行了实证测试。

通过对比不同任务和架构的纵横比变化，研究人员还观察到了数据集与标度指数之间的多个经验关系。这项研究不仅为不同标度机制提供了一个分类框架，强调了损失改善可能源于多种不同机制，而且还为理解标度指数的微观起源及其相互关系提供了见解。

据悉，经过训练的深度神经网络的总体损失通常与训练数据集的大小，或网络中参数的数量遵循精确的幂律标度关系。

附：英文原文

Title: Explaining neural scaling laws

Author: Bahri, Yasaman, Dyer, Ethan, Kaplan, Jared, Lee, Jaehoon, Sharma, Utkarsh

Issue&Volume: 2024-6-24

Abstract: The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origin and relationships between scaling exponents.

DOI: 10.1073/pnas.2311878121

Source: https://www.pnas.org/doi/abs/10.1073/pnas.2311878121

期刊信息

PNAS：《美国科学院院刊》，创刊于1914年。隶属于美国科学院，最新IF：12.779

官方网址：https://www.pnas.org

投稿链接：https://www.pnascentral.org/cgi-bin/main.plex