TY - JOUR
T1 - Limits to robustness and reproducibility in the demarcation of operational taxonomic units
AU - Schmidt, Thomas S.B.
AU - Matias Rodrigues, João F.
AU - von Mering, Christian
N1 - Publisher Copyright:
© 2014 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - The demarcation of operational taxonomic units (OTUs) from complex sequence data sets is a key step in contemporary studies of microbial ecology. However, as biologically motivated 'optimal' OTU-binning algorithms remain elusive, many conceptually distinct approaches continue to be used. Using a global data set of 887870 bacterial 16S rRNA gene sequences, we objectively quantified biases introduced by several widely employed sequence clustering algorithms. We found that OTU-binning methods often provided surprisingly non-equivalent partitions of identical data sets, notably when clustering to the same nominal similarity thresholds; and we quantified the resulting impact on ecological data description for a well-defined human skin microbiome data set. We observed that some methods were very robust to varying clustering thresholds, while others were found to be highly susceptible even to slight threshold variations. Moreover, we comprehensively quantified the impact of the choice of 16S rRNA gene subregion, as well as of data set scope and context on algorithm performance. Our findings may contribute to an enhanced comparability of results across sequence-processing pipelines, and we arrive at recommendations towards higher levels of standardization in established workflows.
AB - The demarcation of operational taxonomic units (OTUs) from complex sequence data sets is a key step in contemporary studies of microbial ecology. However, as biologically motivated 'optimal' OTU-binning algorithms remain elusive, many conceptually distinct approaches continue to be used. Using a global data set of 887870 bacterial 16S rRNA gene sequences, we objectively quantified biases introduced by several widely employed sequence clustering algorithms. We found that OTU-binning methods often provided surprisingly non-equivalent partitions of identical data sets, notably when clustering to the same nominal similarity thresholds; and we quantified the resulting impact on ecological data description for a well-defined human skin microbiome data set. We observed that some methods were very robust to varying clustering thresholds, while others were found to be highly susceptible even to slight threshold variations. Moreover, we comprehensively quantified the impact of the choice of 16S rRNA gene subregion, as well as of data set scope and context on algorithm performance. Our findings may contribute to an enhanced comparability of results across sequence-processing pipelines, and we arrive at recommendations towards higher levels of standardization in established workflows.
UR - https://www.scopus.com/pages/publications/84928253083
U2 - 10.1111/1462-2920.12610
DO - 10.1111/1462-2920.12610
M3 - Article
C2 - 25156547
AN - SCOPUS:84928253083
SN - 1462-2912
VL - 17
SP - 1689
EP - 1706
JO - Environmental Microbiology
JF - Environmental Microbiology
IS - 5
ER -