当前位置:科学网首页 > 小柯机器人 >详情
白金谱系:遗传变异的长期基准
作者:小柯机器人 发布时间:2025/8/5 16:06:28

美国PacBio公司Michael A. Eberle研究组提出了白金谱系:遗传变异的长期基准。该项研究成果发表在2025年8月4日出版的《自然—方法学》上。

为了创建一个更全面的真相集,该团队在一个大谱系(CEPH-1463)中以孟德尔遗传为主题,过滤PacBio高保真(HiFi)、Illumina和Oxford Nanopore Technologies平台上的变异。这生成了一个超过4.7百万单核苷酸变异、767795个插入和删除(索引)、537486个串联重复序列和24315个结构变异,覆盖2.77GRCh38基因组的Gb。这项工作增加了约200Mb的高置信度区域,包括8%以上的小变异,并介绍了NA12878及其家族的第一个串联重复序列和结构变异真值集。作为这个改进基准的价值的一个例子,课题组研究人员重新训练DeepVariant对这些数据进行主题化,将基因分型错误减少了约34%。

研究人员表示,基因组测序的最新进展改善了人类基因组复杂区域的变异召唤。然而,由于现有的标准往往侧重于特异性,而忽略了难以分析区域的完整性,因此很难对变量调用性能进行量化。

附:英文原文

Title: The Platinum Pedigree: a long-read benchmark for genetic variants

Author: Kronenberg, Zev, Nolan, Cillian, Porubsky, David, Mokveld, Tom, Rowell, William J., Lee, Sangjin, Dolzhenko, Egor, Chang, Pi-Chuan, Holt, James M., Saunders, Christopher T., Olson, Nathan D., Steely, Cody J., McGee, Sean, Guarracino, Andrea, Koundinya, Nidhi, Harvey, William T., Watkins, W. Scott, Munson, Katherine M., Hoekzema, Kendra, Chua, Khi Pin, Chen, Xiao, Fanslow, Cairbre, Lambert, Christine, Dashnow, Harriet, Garrison, Erik, Smith, Joshua D., Lansdorp, Peter M., Zook, Justin M., Carroll, Andrew, Jorde, Lynn B., Neklason, Deborah W., Quinlan, Aaron R., Eichler, Evan E., Eberle, Michael A.

Issue&Volume: 2025-08-04

Abstract: Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77Gb of the GRCh38 genome. This work adds ~200Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.

DOI: 10.1038/s41592-025-02750-y

Source: https://www.nature.com/articles/s41592-025-02750-y

期刊信息

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex