当前位置:科学网首页 > 小柯机器人 >详情
作者:小柯机器人 发布时间:2024/5/18 16:35:14

美国国立卫生研究院Richa Agarwala团队实现拍碱基级核苷酸资源的索引和搜索。2024年5月16日,《自然—方法学》杂志在线发表了这项成果。

目前,对于大多数研究人员来说,搜索资源中大量且快速增长的核苷酸内容是不切实际的,例如序列读取档案(Sequence Read Archive)中的运行和GenBank中全基因组枪式测序项目的组装。




Title: Indexing and searching petabase-scale nucleotide resources

Author: Shiryev, Sergey A., Agarwala, Richa

Issue&Volume: 2024-05-16

Abstract: Searching vast and rapidly growing nucleotide content in resources, such as runs in the Sequence Read Archive and assemblies for whole-genome shotgun sequencing projects in GenBank, is currently impractical for most researchers. Here we present Pebblescout, a tool that navigates such content by providing indexing and search capabilities. Indexing uses dense sampling of the sequences in the resource. Search finds subjects (runs or assemblies) that have short sequence matches to a user query, with well-defined guarantees and ranks them using informativeness of the matches. We illustrate the functionality of Pebblescout by creating eight databases that index over 3.7 petabases. The web service of Pebblescout can be reached at https://pebblescout.ncbi.nlm.nih.gov. We show that for a wide range of query lengths, Pebblescout provides a data-driven way for finding relevant subsets of large nucleotide resources, reducing the effort for downstream analysis substantially. We also show that Pebblescout results compare favorably to MetaGraph and Sourmash.

DOI: 10.1038/s41592-024-02280-z

Source: https://www.nature.com/articles/s41592-024-02280-z


Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99