Dataset.

The most exposed regions of SARS-CoV-2 structural proteins are subject to strong positive selection and gene overlap may locally modify this behavior [Dataset]

Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360231
Digital.CSIC. Repositorio Institucional del CSIC
  • Rubio, Alejandro
  • Toro, María de
  • Pérez-Pulido, Antonio J.
Suppl. Fig S1. Comparison of length and number of substitutions versus p-value in the calculation of the Ka/Ks ratio. Genes have been colored according to the group to which they belong. A regression line has been added, together with its correlation coefficient and associated p-value. Suppl. Fig S2. Distribution of Ka/Ks along the length of genes S, M, N and E (black line). The normalized Shannon entropy obtained from Nextstrain database is shown for comparison (https://nextstrain.org/ncov/gisaid/global/6m). Pfam domains have been included (below): S → bCovS1N (PF16451, Betacoronavirus-like spike glycoprotein S1, N-terminal), bCoV_S1_RBD (PF09408, Betacoronavirus spike glycoprotein S1, receptor binding), CoV_S1_C (PF19209, Coronavirus spike glycoprotein S1, C-terminal), CoV_S2 (PF01601, Coronavirus spike glycoprotein S2); M → CoVM (PF01635, Coronavirus M matrix/glycoprotein); N → bCoV_lipid_BD (PF09399, Betacoronavirus lipid binding protein), bCoV_Orf14 (PF17635, Betacoronavirus uncharacterised protein 14), CoV_nucleocap (PF00937, Coronavirus nucleocapsid); E → CoVE (PF02723, Coronavirus small envelope protein E). The blue line marks the Ka/Ks value of 1. Suppl. Table S1. Genomes used in this work. Suppl. Table S2. Ka/Ks ratio obtained for each SARS-CoV-2 gene, together with the associated p-value. Blue color highlights structural genes, red color highlights non-structural genes, and gray color highlights accessory factors., The SARS-CoV-2 virus pandemic that emerged in 2019 has been an unprecedented event in international science, as it has been possible to sequence millions of genomes, tracking their evolution very closely. This has enabled various types of secondary analyses of these genomes, including the measurement of their sequence selection pressure. In this work we have been able to measure the selective pressure of all the described SARS-CoV-2 genes, even analyzed by sequence regions, and we show how this type of analysis allows us to separate the genes between those subject to positive selection (usually those that code for surface proteins or those exposed to the host immune system) and those subject to negative selection because they require greater conservation of their structure and function. We have also seen that when another gene with an overlapping reading frame appears within a gene sequence, the overlapping sequence between the two genes evolves under a stronger purifying selection than the average of the non-overlapping regions of the main gene. We propose this type of analysis as a useful tool for locating and analyzing all the genes of a viral genome, when an adequate number of sequences are available., Peer reviewed
 
DOI: http://hdl.handle.net/10261/360231
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360231

HANDLE: http://hdl.handle.net/10261/360231
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360231
 
Ver en: http://hdl.handle.net/10261/360231
Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360231

Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360225
Artículo científico (article). 2024

THE MOST EXPOSED REGIONS OF SARS-COV-2 STRUCTURAL PROTEINS ARE SUBJECT TO STRONG POSITIVE SELECTION AND GENE OVERLAP MAY LOCALLY MODIFY THIS BEHAVIOR

Digital.CSIC. Repositorio Institucional del CSIC
  • Rubio, Alejandro
  • Toro, María de
  • Pérez-Pulido, Antonio J.
The SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) pandemic that emerged in 2019 has been an unprecedented event in international science, as it has been possible to sequence millions of genomes, tracking their evolution very closely. This has enabled various types of secondary analyses of these genomes, including the measurement of their sequence selection pressure. In this work, we have been able to measure the selective pressure of all the described SARS-CoV-2 genes, even analyzed by sequence regions, and we show how this type of analysis allows us to separate the genes between those subject to positive selection (usually those that code for surface proteins or those exposed to the host immune system) and those subject to negative selection because they require greater conservation of their structure and function. We have also seen that when another gene with an overlapping reading frame appears within a gene sequence, the overlapping sequence between the two genes evolves under a stronger purifying selection than the average of the non-overlapping regions of the main gene. We propose this type of analysis as a useful tool for locating and analyzing all the genes of a viral genome when an adequate number of sequences are available.IMPORTANCEWe have analyzed the selection pressure of all severe acute respiratory syndrome coronavirus 2 genes by means of the nonsynonymous (Ka) to synonymous (Ks) substitution rate. We found that protein-coding genes are exposed to strong positive selection, especially in the regions of interaction with other molecules (host receptor and genome of the virus itself). However, overlapping coding regions are more protected and show negative selection. This suggests that this measure could be used to study viral gene function as well as overlapping genes., We would like to thank C3UPO for the HPC support. We also want to thank to Laboratorio de Microbiología (Hospital Universitario San Pedro, Logroño, Spain), Maria Pilar Bea Escudero (CIBIR, La Rioja, Spain) and to the SeqCOVID consortium for the support on collecting, sequencing, and analyzing the SARS-CoV-2 genomes included in this paper. We would like to thank Alex Bateman for helpful comments on the manuscript. This methodology developed for this research has been funded in part by PID2020-114861GB-I00/AEI/10.13039/501100011033 (Agencia Estatal de Investigación/Ministry of Science and Innovation of the Spanish Government)., Peer reviewed




Digital.CSIC. Repositorio Institucional del CSIC
oai:digital.csic.es:10261/360231
Dataset. 2023

THE MOST EXPOSED REGIONS OF SARS-COV-2 STRUCTURAL PROTEINS ARE SUBJECT TO STRONG POSITIVE SELECTION AND GENE OVERLAP MAY LOCALLY MODIFY THIS BEHAVIOR [DATASET]

Digital.CSIC. Repositorio Institucional del CSIC
  • Rubio, Alejandro
  • Toro, María de
  • Pérez-Pulido, Antonio J.
Suppl. Fig S1. Comparison of length and number of substitutions versus p-value in the calculation of the Ka/Ks ratio. Genes have been colored according to the group to which they belong. A regression line has been added, together with its correlation coefficient and associated p-value. Suppl. Fig S2. Distribution of Ka/Ks along the length of genes S, M, N and E (black line). The normalized Shannon entropy obtained from Nextstrain database is shown for comparison (https://nextstrain.org/ncov/gisaid/global/6m). Pfam domains have been included (below): S → bCovS1N (PF16451, Betacoronavirus-like spike glycoprotein S1, N-terminal), bCoV_S1_RBD (PF09408, Betacoronavirus spike glycoprotein S1, receptor binding), CoV_S1_C (PF19209, Coronavirus spike glycoprotein S1, C-terminal), CoV_S2 (PF01601, Coronavirus spike glycoprotein S2); M → CoVM (PF01635, Coronavirus M matrix/glycoprotein); N → bCoV_lipid_BD (PF09399, Betacoronavirus lipid binding protein), bCoV_Orf14 (PF17635, Betacoronavirus uncharacterised protein 14), CoV_nucleocap (PF00937, Coronavirus nucleocapsid); E → CoVE (PF02723, Coronavirus small envelope protein E). The blue line marks the Ka/Ks value of 1. Suppl. Table S1. Genomes used in this work. Suppl. Table S2. Ka/Ks ratio obtained for each SARS-CoV-2 gene, together with the associated p-value. Blue color highlights structural genes, red color highlights non-structural genes, and gray color highlights accessory factors., The SARS-CoV-2 virus pandemic that emerged in 2019 has been an unprecedented event in international science, as it has been possible to sequence millions of genomes, tracking their evolution very closely. This has enabled various types of secondary analyses of these genomes, including the measurement of their sequence selection pressure. In this work we have been able to measure the selective pressure of all the described SARS-CoV-2 genes, even analyzed by sequence regions, and we show how this type of analysis allows us to separate the genes between those subject to positive selection (usually those that code for surface proteins or those exposed to the host immune system) and those subject to negative selection because they require greater conservation of their structure and function. We have also seen that when another gene with an overlapping reading frame appears within a gene sequence, the overlapping sequence between the two genes evolves under a stronger purifying selection than the average of the non-overlapping regions of the main gene. We propose this type of analysis as a useful tool for locating and analyzing all the genes of a viral genome, when an adequate number of sequences are available., Peer reviewed




1106