IDseq empowers simple, successful following of rising microorganisms
Researchers have discovered a cloud-based tool called IDseq that may make it possible to rapidly detect, identify, and track emerging pathogens such as SARS-CoV-2.
This tool can identify pathogens before there is an available complete genome sequence; thus, it can be used for current infectious disease outbreaks and also for emerging ones. This will substantially aid in preventing future pandemics. The study was published in the journal GigaScience. The coronavirus pandemic demonstrates the importance of global infectious disease monitoring. Finding the cause of an infectious disease outbreak is challenging, especially if it stems from a previously unknown pathogen. IDseq, an open-source, cloud-based metagenomic analysis platform, identifies both novel and existing disease-causing pathogens from a given sample — be it a human, animal, or parasite — to provide an actionable report of what is happening on the ground in labs and clinics anywhere in the world.
“IDseq can be thought of as an early warning radar for emerging or novel infectious agents,” said Joe DeRisi, Ph.D., Co-President of the Chan Zuckerberg Biohub, who contributed to the identification of the SARS coronavirus in 2003 and whose research lab at the University of California, San Francisco initiated the IDseq tool. It is designed to enable the global health community to leverage the ever-decreasing cost of sequencing for tracking and identifying infectious diseases in essentially any sample. “At the beginning of the coronavirus pandemic, researchers in Cambodia used IDseq to help confirm and sequence the whole genome of the country’s first case of COVID-19 in a matter of days, and in California, we’re providing critical SARS-CoV-2 genomic data to public health officials to inform contact tracing and intervention strategies.”
In a study published in GigaScience, scientists use various approaches to demonstrate that the IDseq tool is indeed able to reliably identify emerging pathogens, among them, as proof of principle, a nasal swab from a COVID-19 patient in Cambodia. A partnership between the Chan Zuckerberg Biohub, the Chan Zuckerberg Initiative (CZI), and the Bill and Melinda Gates Foundation enabled these researchers to sequence and confirm the country’s first case of COVID-19 in a matter of days — not the weeks it could typically take. The results demonstrate that IDseq can detect the presence of an emerging pathogen prior to the existence of a full reference genome. IDseq also now contains a new workflow for building SARS-CoV-2 consensus genomes.
“Metagenomic sequencing (mNGS) is an incredibly useful tool for pathogen detection because of its highly sensitive and hypothesis-free nature,” said Katrina Kalantar, Computational Biologist at CZI. “We’ve seen labs that are using IDseq for existing mNGS studies rapidly pivot their focus to more targeted sequencing of SARS-CoV-2, which has helped researchers better understand coronavirus transmission patterns.”
In Cambodia, researchers uploaded the genome sequence to open source pathogen data repository GISAID (Global Initiative on Sharing All Influenza Data) and to Nextstrain, so scientists anywhere can see the full genome sequence of the SARS-CoV-2 coronavirus and study it within the broader context of SARS-CoV-2 coronavirus sequences uploaded globally.
Researchers at the Cambodian National Center for Parasitology, Entomology and Malaria Control (CNM) and the National Institute of Allergy and Infectious Diseases (NIAID) partnered with the Institut Pasteur Cambodia to complete this research. These researchers are one of several teams around the world receiving molecular biology and bioinformatics training from the infectious disease team at the Biohub; free access, training, and compute on the IDseq platform from CZI; and the necessary equipment and supplies to begin work in their own countries through the Grand Challenges Explorations Grants.
Unlike tests that are specific for a known agent, such as the SARS-CoV-2 PCR test, mNGS is a universal method that can detect novel disease-causing pathogens, which can be especially useful in cases where researchers may not know what is causing an infection, or what pathogens are circulating in a particular area. A mNGS experiment starts with mass-amplifying DNA traces of pathogens from a patient’s sample, resulting in millions of small bits of DNA sequences, or reads. This enormous dataset must then be analyzed and interpreted using bioinformatic techniques. The aim is to assign individual DNA fragments from the clinical sample to specific pathogens by leveraging knowledge from sequence databases.
Analyzing the massive amount of data from a typical mNGS experiment often requires a battery of specialized bioinformatic tools, including highly specialized expertise and expensive commercially licensed software — making mNGS a hard-to-access method. The new user-friendly IDseq software is open source and freely available to the global health community, reducing the barrier of entry to metagenomics. Researchers can reuse and build upon the code, which works via a cloud-based service and a web application designed for collaboration and data sharing. The pipeline starts with raw sequencing data as the input and then goes through steps of filtering, quality control, alignment, and reporting, and visualization.