使用突变注释网络的压缩泛基因组学,这一成果由加州大学Yatish Turakhia小组经过不懈努力而取得。这一研究成果于2026年1月12日发表在国际顶尖学术期刊《自然—遗传学》上。
研究人员表示,泛基因组学是一个新兴领域,其主题是基因组集合,而不是单一参考,以减少偏见和捕获物种内多样性。然而,现有的全基因组数据格式在扩展到数百万个基因组时面临挑战,并且主要强调变异,往往忽略了潜在的突变事件和进化关系。这项工作引入了泛基因组突变注释网络(PanMAN),这是一种无损的泛基因组表示,与现有的保存变异的格式相比,它在文件大小上实现了3.5 - 1391倍的压缩比,在更大的数据集上性能普遍提高。除了压缩之外,PanMAN还通过编码跨基因组推断的详细突变和进化历史来增加表征能力,从而实现新的生物学见解。使用PanMAN,从800万个公开的序列中构建了一个全面的SARS-CoV-2泛基因组,仅需要366 MB的磁盘空间。
课题组还介绍了“panmanUtils”,这是一个支持通用分析并确保与现有软件互操作性的工具包。PanMAN将大大提高泛基因组分析和数据共享的规模、速度、分辨率和范围。
附:英文原文
Title: Compressive pangenomics using mutation-annotated networks
Author: Walia, Sumit, Motwani, Harsh, Tseng, Yu-Hsiang, Smith, Kyle, Corbett-Detig, Russell, Turakhia, Yatish
Issue&Volume: 2026-01-12
Abstract: Pangenomics is an emerging field that uses collections of genomes, rather than a single reference, to reduce bias and capture intra-species diversity. However, existing pangenomic data formats face challenges in scaling to millions of genomes and primarily emphasize variation, often neglecting the underlying mutational events and evolutionary relationships. This work introduces Pangenome Mutation-Annotated Network (PanMAN), a lossless pangenome representation that achieves compression ratios ranging from 3.5–1,391× in file sizes compared to existing variation-preserving formats, with performance generally improving on larger datasets. In addition to compression, PanMAN increases representational capacity by encoding detailed mutational and evolutionary histories inferred across genomes, thereby enabling new biological insights. Using PanMAN, a comprehensive SARS-CoV-2 pangenome was constructed from 8 million publicly available sequences, requiring only 366 MB of disk space. We also present ‘panmanUtils’, a toolkit that supports common analyses and ensures interoperability with existing software. PanMAN is poised to greatly improve the scale, speed, resolution and scope of pangenomic analysis and data sharing.
DOI: 10.1038/s41588-025-02478-7
Source: https://www.nature.com/articles/s41588-025-02478-7
Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:41.307
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex
