Scraping the bottom of the barrel: are rare high throughput sequences artifacts?

TitleScraping the bottom of the barrel: are rare high throughput sequences artifacts?
Publication TypeJournal Article
Year of Publication2015
AuthorsBrown, SP, Veach, AM, Rigdon-Huss, AR, Grond, K, Lickteig, SK, Lothamer, K, Oliver, AK, Jumpponen, A
JournalFungal Ecology
Pagination221 -225
Accession NumberKNZ001636
Keywordsfungi, High-throughput sequencing, Rare biosphere, Singleton

Metabarcoding data generated using next-generation sequencing (NGS) technologies are overwhelmed with rare taxa and skewed in Operational Taxonomic Unit (OTU) frequencies comprised of few dominant taxa. Low frequency OTUs comprise a rare biosphere of singleton and doubleton OTUs, which may include many artifacts. We present an in-depth analysis of global singletons across sixteen NGS libraries representing different ribosomal RNA gene regions, NGS technologies and chemistries. Our data indicate that many singletons (average of 38 % across gene regions) are likely artifacts or potential artifacts, but a large fraction can be assigned to lower taxonomic levels with very high bootstrap support (∼32 % of sequences to genus with ≥90 % bootstrap cutoff). Further, many singletons clustered into rare OTUs from other datasets highlighting their overlap across datasets or the poor performance of clustering algorithms. These data emphasize a need for caution when discarding rare sequence data en masse: such practices may result in throwing the baby out with the bathwater, and underestimating the biodiversity. Yet, the rare sequences are unlikely to greatly affect ecological metrics. As a result, it may be prudent to err on the side of caution and omit rare OTUs prior to downstream analyses.