Publicación
Artículo científico (article).
Unifying the known and unknown microbial coding sequence space
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/267496
Digital.CSIC. Repositorio Institucional del CSIC
- Vanni, Chiara
- Schechter, Matthew S.
- Acinas, Silvia G.
- Barberán, Albert
- Buttigieg, Pier Luigi.
- Casamayor, Emilio O.
- Delmont, Tom O.
- Duarte, Carlos M.
- Eren, A. Murat
- Finn, Robert D.
- Kottmann, Renzo
- Mitchell, Alex
5 figures, 13 appendixes.-- Data availability: We used public data as described in the Methods section and Appendix 1-table 5.The code used for the analyses in the manuscript is available at https://github.com/functional-dark-side/functional-dark-side.github.io/tree/master/scripts. A list with the program versions can be found in https://github.com/functional-dark-side/functional-dark-side.github.io/blob/master/programs_and_versions.txt.The code to create the figures is available at https://github.com/functional-dark-side/vanni_et_al-figures, and the data for the figure can be downloaded from https://doi.org/10.6084/m9.figshare.12738476.v2. A reproducible version of the workflow is available at https://github.com/functional-dark-side/agnostos-wf.The data is publicly available at https://doi.org/10.6084/m9.figshare.12459056, Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data., The authors thankfully acknowledge the computer resources at MareNostrum and the technical support provided by Barcelona Supercomputing Center (RES-AECT-2014-2-0085), the BMBF877 funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A537B, 031A533A, 031A538A, 031A533B, 031A535A, 031A537C, 031A534A, 031A532B), the University of Oxford Advanced Research Computing (http://dx.doi.org/10.5281/zenodo.22558) and the MARBITS bioinformatics core at ICM-CSIC.CV was supported by the Max Planck Society. AFG received funding from the European Union’s Horizon 2020 research and innovation program Blue Growth: Unlocking the potential of Seas and Oceans under grant agreement no. 634486 (project acronym INMARE). AM was supported by the Biotechnology and Biological Sciences Research Council [BB/M011755/1, BB/R015228/1] and RDF by the European Molecular Biology Laboratory core funds. EOC was supported by project INTERACTOMA RTI2018-101205-B-I00 from the Spanish Agency of Science MICIU/AEI. S 887 GA and PS received additional funding by the project MAGGY (CTM2017-87736-R) from the Spanish Ministry of Economy and Competitiveness. The Malaspina 2010 Expedition was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) through the Consolider-Ingenio program (ref. CSD2008-00077). The authors thank Johannes Söding and Alex Bateman for helpful discussions., Peer reviewed, With the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S).
DOI: http://hdl.handle.net/10261/267496
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/267496
HANDLE: http://hdl.handle.net/10261/267496
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/267496
Ver en: http://hdl.handle.net/10261/267496
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/267496
1106