WBT Metatranscriptomic Analysis, Integration of SARS-Cov-2 WBT Results With COVID-19 Community Prevalence Data, and Development and Application of Predictive Models

The gold standard for viral detection from an environmental sample is RNA sequence analysis. This type of processing allows accurate binary detection as well as origin tracing, and quantification of viral concentration. The open-source MetaSUB Core Analysis Pipeline (CAP) is a set of computational diagnostic processes that will be used to analyze metagenomes and metatranscriptomes for all wastewater samples. The CAP includes a number of computational analyses organized into a best practices pipeline, including mapping of the strains (MetaPhlAn2 and KrakenUniq), functional organization of the likely functional biochemical profiles of the organisms (HUMAnN2), co-variates from the wastewater background, novel biosynthetic gene clusters (BGCs), variant calling (FreeBayes), strain phylogeny (NextStrain), and antibiotic mapping (ShortBRED) to all sequences in the Comprehensive Antibiotic Resistance Database (CARD).

We have analyzed environmental metatranscriptomic samples as part of the MetaSUB project to survey microbiomes of mass transit systems around the world (11a). Recently, as a part of the MetaCOV project within the MetaSUB Consortium, we have collected and processed environmental metatranscriptomic samples collected during the COVID-19 epidemic from 16 cities (Fig. B). In several of these cities, we have been able to detect RNA that maps to the SARS-CoV-2 genome as well as (for reference) the influenza genome. These results support the possibility of effective viral detection through environmental metatranscriptomic and analyses performed using the MetaSUB CAP, which includes strain detection for SARS-CoV-2.

Figure B. SARS-CoV-2 levels in MetaSUB cities environmental metatranscriptomics sampling. Reads from shotgun RNA-seq were mapped to the Wuhan reference strain and summed (y-axis, log10) for each city (x-axis).

While detection of RNA viruses in wastewater requires additional validation, the ongoing MetaSUB project was also able to detect over 10,000 novel (DNA) viruses in fewer than 5,000 samples using metagenomic sequencing. We were able to provide orthogonal evidence for these viruses by identifying CRISPR sequences in the same samples. A large fraction (30%) of CRISPR spacers identified in the MetaSUB data mapped to MetaSUB viruses. This was roughly similar to the fraction of CRISPR spacers that mapped to known viruses in NCBI RefSeq despite the fact that RefSeq has orders of magnitude more viruses.