NUEVAS HERRAMIENTAS SINTETICAS Y QUIMIOINFORMATICAS PARA LA CONSTRUCCION Y DIVERSIFICACION DE HETEROCICLOS ¿DRUG-LIKE¿. ACTIVACION C-H Y MACHINE LEARNING
PID2019-104148GB-I00
•
Nombre agencia financiadora Agencia Estatal de Investigación
Acrónimo agencia financiadora AEI
Programa Programa Estatal de Generación de Conocimiento y Fortalecimiento Científico y Tecnológico del Sistema de I+D+i
Subprograma Subprograma Estatal de Generación de Conocimiento
Convocatoria Proyectos I+D
Año convocatoria 2019
Unidad de gestión Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020
Centro beneficiario UNIVERSIDAD DEL PAIS VASCO EUSKAL HERRIKO UNIBERTSITATEA
Identificador persistente http://dx.doi.org/10.13039/501100011033
Publicaciones
Found(s) 20 result(s)
Found(s) 1 page(s)
Found(s) 1 page(s)
Drug Release Nanoparticle System Design: Data Set Compilation and Machine Learning Modeling
RUC. Repositorio da Universidade da Coruña
- He, Shan
- Barón, Ander
- Munteanu, Cristian-Robert
- de Bilbao, Begoña
- Casañola-Martín, Gerardo M.
- Chelu, Mariana
- Musuc, Adina Magdalena
- Bediaga, Harbil
- Ascencio, Estefanía
- Pazos, A.
This document is the Accepted Manuscript version of a Published Work that appeared in final form in ACS Applied Materials & Interfaces, copyright © 2025 American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see 10.1021/acsami.4c16800, [Abstract]: Magnetic nanoparticles (NPs) are gaining significant interest in the field of biomedical functional nanomaterials because of their distinctive chemical and physical characteristics, particularly in drug delivery and magnetic hyperthermia applications. In this paper, we experimentally synthesized and characterized new Fe3O4-based NPs, functionalizing its surface with a 5-TAMRA cadaverine modified copolymer consisting of PMAO and PEG. Despite these advancements, many combinations of NP cores and coatings remain unexplored. To address this, we created a new data set of NP systems from public sources. Herein, 11 different AI/ML algorithms were used to develop the predictive AI/ML models. The linear discriminant analysis (LDA) and random forest (RF) models showed high values of sensitivity and specificity (>0.9) in training/validation series and 3-fold cross validation, respectively. The AI/ML models are able to predict 14 output properties (CC50 (μM), EC50 (μM), inhibition (%), etc.) for all combinations of 54 different NP cores classes vs. 25 different coats and vs. 41 different cell lines, allowing the short listing of the best results for experimental assays. The results of this work may help to reduce the cost of traditional trial and error procedures., The authors acknowledge financial support from Grants ELKARTEK (KK-2022/00032), 2022-2023, (KK-2023/00041), 2023-24 and IT1558-22, and IT1546-22, 2022-2025, funded by Basque Government/Eusko Jaurlaritza, Grant PID2019-104148GB-I00 and PID2022-136993OB-I00 funded by MCIN/AEI/10.13039/501100011033and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” and also Grant IKERDATA 2022/IKER/000040 funded by NextGenerationEU funds of European Commission. This work was also supported in part by the National Science Foundation NSF MRI award OAC-2019077. The authors are grateful for financial and administrative support provided by the Department of Coatings and Polymer Materials at North Dakota State University (USA). The authors would like to acknowledge as well the Spanish Ministry of Science and Innovation for financial support under grant No. PID2022-136993OB-I00 (AEI/FEDER, UE), funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union”. CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities to strengthen the research centers of the Galician University System (CIGUS). The authors thanks to the grant ED431C 2022/46 – Competitive Reference Groups (GRC) – funded by the EU and Xunta de Galicia (Spain). The authors would also like to acknowledge the financial support provided by the Basque Government under research projects IT1500-22, IT1546-22, and MMASINT (KK-2023/00041, Elkartek Program)., Xunta de Galicia; ED431C 2022/46, Eusko Jaurlaritza; ELKARTEK KK-2022/00032, Eusko Jaurlaritza; ELKARTEK KK-2023/00041, Eusko Jaurlaritza; IT1558-22, Eusko Jaurlaritza; IT1546-22, Eusko Jaurlaritza; IT1500-22, Eusko Jaurlaritza; IT1546-2, Eusko Jaurlaritza; IKERDATA 2022/IKER/000040, United States of America. National Science Foundation; OAC-2019077
A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery through Drug Repurposing
RUC. Repositorio da Universidade da Coruña
- Cabrera-Andrade, Alejandro
- López-Cortés, Andrés
- Jaramillo-Koupermann, Gabriela
- González-Díaz, Humberto
- Pazos, A.
- Munteanu, Cristian-Robert
- Pérez-Castillo, Yunierkis
- Tejera, Eduardo
[Abstract]
Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60–70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment., Universidad de Las Américas (Quito, Ecuador); ENF.RCA.18.01, Gobierno Vasco; IT1045-16)-2016–2021
Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60–70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment., Universidad de Las Américas (Quito, Ecuador); ENF.RCA.18.01, Gobierno Vasco; IT1045-16)-2016–2021
MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products
RUC. Repositorio da Universidade da Coruña
- Carracedo-Reboredo, Paula
- Aranzamendi, Eider
- He, Shan
- Arrasate, Sonia
- Munteanu, Cristian-Robert
- Fernández-Lozano, Carlos
- Sotomayor, Nuria
- Lete, Esther
- González-Díaz, Humberto
[Absctract]: The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difficult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R2 = 0.96 overall for training and validation series. It involved a Monte Carlo sampling of > 100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N-triflylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo. This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products., The authors acknowledge financial support from Grant PID2019-104148 GB-I00 and PID2022-137365NB-I00 funded by MCIN/ AEI/10.13039/501100011033 and Grant IT1558-22 funded by Basque Government/Eusko Jaurlaritza, 2022–2025.CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities to strengthen the research centers of the Galician University System (CIGUS)., Eusko Jaurlaritza; IT1558-22
IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds
RUC. Repositorio da Universidade da Coruña
- Quevedo‐Tumailli, Viviana F.
- Ortega-Tenezaca, Bernabé
- Díaz, Humberto G.
[Abstract] The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre‐clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI‐GDV (National Center for Biotechnology Information ‐ Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including nu-meric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI‐GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj= cajand cdataj= cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj= cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation‐Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three data-bases) into and train a predictive model. Shannon’s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML‐CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium. © 2021 by the authors. Licensee MDPI, Basel, Switzerland., H.G.‐D. personally acknowledges financial support from the Minister of Science and Innovation (PID2019‐104148GB‐I00) and a grant (IT1045‐16)—2016–2021 from the Basque Gov‐ ernment. V.Q.T. acknowledges Universidad EstatalAmazónica (UEA) scholarship for postgraduate studies; Ecuador Sciences PhD Program, (UEA.Res.26.2019.06.13), Eusko Jaurlaritza = Gobierno Vasco; IT1045-16, Ecuador. Gobierno; UEA.Res.26.2019.06.13
Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- Casañola, Gerardo
- Torres, Roldán
- Rasulev, Bakhtiyor
- Green, James R.
- González-Díaz, Humberto
Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research., G.D.H. acknowledges financial support from grants from the Ministry of Science and Innovation (PID 2019-104148 GB-I00) and grant no. IT1045-16-2016–2021 from the Basque Government., Peer reviewed
Prediction of acute toxicity of pesticides for Americamysis bahia using linear and nonlinear QSTR modelling approaches
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- Nachimba-Mayanchi, Manuel Mesias
- Puris, Amilkar
- Torres, Roldán
- González-Díaz, Humberto
Globally, pesticides are toxic substances with wide applications. However, the widespread use of pesticides has received increasing attention from regulatory agencies due to their various acute and chronic effects on multiple organisms. In this study, Quantitative Structure-Toxicity Relationship (QSTR) models were established using Multiple Linear Regression (MLR) and five Machine Learning (ML) algorithms to predict pesticide toxicity in Americamysis bahia. The most influential descriptors included in the MLR model are RBF, JGI2, nCbH, nRCOOR, nRSR, nPO4 and ‘Cl-090’, with positive contributions to the dependent variable (negative decimal logarithm of median lethal concentration at 96-h). The Random Forest (RF) regression model was superior amongst the five ML models. We observed higher values of R2 (0.812) and lower values of RMSE (0.595) and MAE (0.462) in the cross-validation training set and external validation set. Similarly, this study had a high level of fitness and was internally robust and externally predictive compared to models presented in similar studies. The results suggest that the developed QSTR models are suitable for reliably predicting the aquatic toxicity of structurally diverse pesticides and can be used for screening, prioritising new pesticides, filling data gaps and overcoming the limitations of in vivo and in vitro tests., G.D.H acknowledges financial support from grants from the Ministry of Science and Innovation, Spain (PID2019-104148 GB-I00) and the grant (IT1045-16) - 2016 – 2021 from the Basque Government, Spain., Peer reviewed
Towards rational nanomaterial design by predicting drug–nanoparticle system interaction vs. bacterial metabolic networks
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- Rasulev, Bakhtiyor
- González-Díaz, Humberto
The emergence of multidrug-resistant (MDR) strains with perturbed metabolic networks (MNs) pushes researchers to improve antibacterial drugs (ADs). Certain nanoparticles (NPs) may present antibacterial activity along with acting as delivery systems. Thus, developing dual antibacterial drug–nanoparticle (DADNP) systems becomes an option. However, testing DADNPs vs. strains with different MNs is a hard and costly task. Artificial intelligence (AI) or machine learning (ML) could accelerate this by predicting bacterial sensitivity. In this work, we used an information fusion perturbation-theory machine learning (IFPTML) analysis and mapping of DADNP (AD + NP) systems vs. MNs of pathogenic bacterial species as a new application of AI/ML methods. Furthermore, most existing AI/ML models do not use cj of experimental conditions of assays (i.e., bacteria species, strain, NP shape, etc.) as input vectors. A working solution may be the use of an AI/ML method with an information fusion (IF) additive approach. Additive IF uses the sets of vectors Ddk, Dnk, Dmk and cdk, cnk, csk as inputs with information about AD, NP, and MN structure and assays separately. Accordingly, the IFPTML algorithm was selected to seek predictive models based on a ChEMBL dataset of >160 000 AD assays enriched with 300 NP assays and >25 MNs of different bacterial species. IFPTML uses the IF process to join the three datasets, PT operators (PTOs) to codify Ddk, Dnk, Dsk and cdk, cnk, csk vector information, and ML algorithms to train the model. The IFPTML linear discriminant analysis (LDA) model with Sp ≈ 90% and Sn ≈ 80% and the best artificial neural network (ANN) model found with Sp ≈ Sn ≈ 95% in the training/validation series presented good results. This kind of model could be useful for DADNP system discovery. We also ran a simulation with >140 000 points of putative DADNP systems vs. wild type and knockout (KO) computationally generated bacterial strains. The linear and additive IFPTML model was able to predict 102 experimental cases of complex DADNPs with a high degree of structural and biological variety. This led us to introduce the concept of MDR computational surveillance that could help to detect new strains of MDR bacteria., G. D. H. acknowledges financial support from grants Ministry of Science and Innovation (PID2019-104148GB-I00) and grant (IT1045-16) – 2016–2021 of Basque Government., Peer reviewed
Prediction of Antileishmanial Compounds: General Model, Preparation, and Evaluation of 2-Acylpyrrole Derivatives
Digital.CSIC. Repositorio Institucional del CSIC
- Santiago, Carlos
- Ortega Tenezaca, Bernabé
- Barbolla, Iratxe
- Fundora, Brenda
- Arrasate, Sonia
- Dea-Ayuela, M. Auxiliadora
- González-Díaz, Humberto
- Sotomayor, Nuria
- Lete, Esther
In this work, the SOFT.PTML tool has been used to pre-process a ChEMBL dataset of pre-clinical assays of antileishmanial compound candidates. A comparative study of different ML algorithms, such as logistic regression (LOGR), support vector machine (SVM), and random forests (RF), has shown that the IFPTML-LOGR model presents excellent values of specificity and sensitivity (81-98%) in training and validation series. The use of this software has been illustrated with a practical case study focused on a series of 28 derivatives of 2-acylpyrroles 5a,b, obtained through a Pd(II)-catalyzed C-H radical acylation of pyrroles. Their in vitro leishmanicidal activity against visceral (L. donovani) and cutaneous (L. amazonensis) leishmaniasis was evaluated finding that compounds 5bc (IC= 30.87 μM, SI > 10.17) and 5bd (IC= 16.87 μM, SI > 10.67) were approximately 6-fold more selective than the drug of reference (miltefosine) in in vitro assays against L. amazonensis promastigotes. In addition, most of the compounds showed low cytotoxicity, CC> 100 μg/mL in J774 cells. Interestingly, the IFPMTL-LOGR model predicts correctly the relative biological activity of these series of acylpyrroles. A computational high-throughput screening (cHTS) study of 2-acylpyrroles 5a,b has been performed calculating >20,700 activity scores vs a large space of 647 assays involving multiple Leishmania species, cell lines, and potential target proteins. Overall, the study demonstrates that the SOFT.PTML all-in-one strategy is useful to obtain IFPTML models in a friendly interface making the work easier and faster than before. The present work also points to 2-acylpyrroles as new lead compounds worthy of further optimization as antileishmanial hits., Ministerio de Ciencia e Innovación (PID2019-104148GB-I00) and Gobierno Vasco (IT1558-22) are gratefully acknowledged for their financial support. I.B. wishes to thank Fundación Biofísica Bizkaia/Biofisika Bizkaia Fundazioa (FBB) for a postdoctoral grant funded by BERC Basque Government program.
IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds
Digital.CSIC. Repositorio Institucional del CSIC
- Quevedo-Tumailli, Viviana
- Ortega Tenezaca, Bernabé
- González-Díaz, Humberto
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information-Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon's entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium., H.G.-D. personally acknowledges financial support from the Minister of Science and Innovation (PID2019-104148GB-I00) and a grant (IT1045-16)—2016–2021 from the Basque Government. V.Q.T. acknowledges Universidad EstatalAmazónica (UEA) scholarship for postgraduate studies; Ecuador Sciences PhD Program, (UEA.Res.26.2019.06.13)., Peer reviewed
DOI: http://hdl.handle.net/10261/311084, https://api.elsevier.com/content/abstract/scopus_id/85120361941
IFPTML mapping of nanoparticle antibacterial activity vs. pathogen metabolic networks
Digital.CSIC. Repositorio Institucional del CSIC
- Ortega Tenezaca, Bernabé
- González-Díaz, Humberto
Nanoparticles are useful antimicrobial drug-release systems, but some nanoparticles also exhibit antibacterial activity. However, investigation of their antibacterial activity is a difficult and slow process due to the numerous combinations of nanoparticle size, shape, and composition vs. biological tests, assay organisms, and multiple activity parameters to be measured. Additionally, the overuse of antibiotics has led to the emergence of resistant bacterial strains with different metabolic networks. Computational models may speed up this process, but the models reported to date do not to consider all the previous factors, and the data sources are dispersed and not curated. Thus, herein, we used an information fusion, perturbation-theory machine learning (IFPTML) approach, which is introduced by us for the first time, to fit a model for the discovery of antibacterial nanoparticles. The dataset studied had 15 classes of nanoparticles (1-100 nm) with most cases in the range of 1-50 nm vs. >20 pathogenic bacteria species with different metabolic networks. The nanoparticles studied included metal nanoparticles of Au, Ag, and Cu; oxide nanoparticles of Zn, Cu, La, Al, Fe, Sn, Ti, Cd, and Si; and metal salt nanoparticles of CuI and CdS. We used the SOFT.PTML software (our own application) with a user-friendly interface for the IFPTML calculations and a control statistics package. Using SOFT.PTML, we found a linear logistic regression equation that could model 4 biological activity parameters using only 8 variables with χ2 = 2265.75, p-level <0.05, sensitivity, Sn = 79.4, and specificity, Sp = 99.3, for 3213 cases (nanoparticle-bacteria pairs) in the training series. The model had Sn = 80.8 and Sp = 99.3 for 2114 cases in the external validation series. We also developed a random forest non-linear model with higher values of Sn and Sp = 98-99% in the training/validation series, although it was more complicated to use. SOFT.PTML has been demonstrated to be a useful tool for the analysis of complex data in nanotechnology. We also introduced a new anabolism-catabolism unbalance index of metabolic networks to reveal the biological connotation of the IFPTML predictions for antibacterial nanoparticles. These new models open a new door for the discovery of NPs vs. new bacterial species and strains with different topological structures of their metabolic networks., G. D. H. personally acknowledges financial support from grants Minister of Science and Innovation (PID2019-104148GB-I00) and grant (IT1045-16) – 2016–2021 of Basque Government., Peer reviewed
DOI: http://hdl.handle.net/10261/311280, https://api.elsevier.com/content/abstract/scopus_id/85099789786
Towards machine learning discovery of dual antibacterial drug-nanoparticle systems
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- González-Díaz, Humberto
Artificial Intelligence/Machine Learning (AI/ML) algorithms may speed up the design of DADNP systems formed by Antibacterial Drugs (AD) and Nanoparticles (NP). In this work, we used IFPTML = Information Fusion (IF) + Perturbation-Theory (PT) + Machine Learning (ML) algorithm for the first time to study of a large dataset of putative DADNP systems composed by >165 000 ChEMBL AD assays and 300 NP assays vs. multiple bacteria species. We trained alternative models with Linear Discriminant Analysis (LDA), Artificial Neural Networks (ANN), Bayesian Networks (BNN), K-Nearest Neighbour (KNN) and other algorithms. IFPTML-LDA model was simpler with values of Sp ≈ 90% and Sn ≈ 74% in both training (>124 K cases) and validation (>41 K cases) series. IFPTML-ANN and KNN models are notably more complicated even when they are more balanced Sn ≈ Sp ≈ 88.5%-99.0% and AUROC ≈ 0.94-0.99 in both series. We also carried out a simulation (>1900 calculations) of the expected behavior for putative DADNPs in 72 different biological assays. The putative DADNPs studied are formed by 27 different drugs with multiple classes of NP and types of coats. In addition, we tested the validity of our additive model with 80 DADNP complexes experimentally synthetized and biologically tested (reported in >45 papers). All these DADNPs show values of MIC < 50 μg mL-1 (cutoff used) better that MIC of AD and NP alone (synergistic or additive effect). The assays involve DADNP complexes with 10 types of NP, 6 coating materials, NP size range 5-100 nm vs. 15 different antibiotics, and 12 bacteria species. The IFPTML-LDA model classified correctly 100% (80 out of 80) DADNP complexes as biologically active. IFPMTL additive strategy may become a useful tool to assist the design of DADNP systems for antibacterial therapy taking into consideration only information about AD and NP components by separate., G. D. H acknowledges financial support from grants Minister of Science and Innovation (PID2019-104148GB-I00) and grant (IT1045-16) – 2016–2021 of Basque Government., Peer reviewed
DOI: http://hdl.handle.net/10261/329945, https://api.elsevier.com/content/abstract/scopus_id/85119373578
Machine learning in antibacterial discovery and development: A bibliometric and network analysis of research hotspots and trends
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- González-Díaz, Humberto
Machine learning (ML) methods are used in cheminformatics processes to predict the activity of an unknown drug and thus discover new potential antibacterial drugs. This article conducts a bibliometric study to analyse the contributions of leading authors, universities/organisations and countries in terms of productivity, citations and bibliographic linkage. A sample of 1596 Scopus documents for the period 2006-2022 is the basis of the study. In order to develop the analysis, bibliometrix R-Tool and VOSviewer software were used. We determined essential topics related to the application of ML in the field of antibacterial development (Computer model in antibacterial drug design, and Learning algorithms and systems for forecasting). We identified obsolete and saturated areas of research. At the same time, we proposed emerging topics according to the various analyses carried out on the corpus of published scientific literature (Title, abstract and keywords). Finally, the applied methodology contributed to building a broader and more specific "big picture" of ML research in antibacterial studies for the focus of future projects., G.D.H acknowledges financial support from grants from the Basque Government IT1558-22 (2022-2025), SPRI ELKARTEK KK-2022/00032 (2022-2024), and MICIIN PID2019-104148GB-I00 (2020-2022)., Peer reviewed
DOI: http://hdl.handle.net/10261/339188, https://api.elsevier.com/content/abstract/scopus_id/85147605887
Trends in Nanoparticles for Leishmania Treatment: A Bibliometric and Network Analysis
Digital.CSIC. Repositorio Institucional del CSIC
- Mazón-Ortiz, Gabriel
- Cerda-Mejía, Galo
- Gutiérrez Morales, Eberto
- Diéguez, Karel
- Ruso, Juan M.
- González-Díaz, Humberto
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/)., Leishmaniasis is a neglected tropical illness with a wide variety of clinical signs ranging from visceral to cutaneous symptoms, resulting in millions of new cases and thousands of fatalities reported annually. This article provides a bibliometric analysis of the main authors’ contributions, institutions, and nations in terms of productivity, citations, and bibliographic linkages to the application of nanoparticles (NPs) for the treatment of leishmania. The study is based on a sample of 524 Scopus documents from 1991 to 2022. Utilising the Bibliometrix R-Tool version 4.0 and VOSviewer software, version 1.6.17 the analysis was developed. We identified crucial subjects associated with the application of NPs in the field of antileishmanial development (NPs and drug formulation for leishmaniasis treatment, animal models, and experiments). We selected research topics that were out of date and oversaturated. Simultaneously, we proposed developing subjects based on multiple analyses of the corpus of published scientific literature (title, abstract, and keywords). Finally, the technique used contributed to the development of a broader and more specific “big picture” of nanomedicine research in antileishmanial studies for future projects., H.G.-D. acknowledges financial support from grants from the Basque Government IT1558-22 (2022–2025), SPRI ELKARTEK KK-2022/00032 (2022–2024), and MICIIN PID2019-104148GB-I00 (2020–2022)., Peer reviewed
Identification of Riluzole derivatives as novel calmodulin inhibitors with neuroprotective activity by a joint synthesis, biosensor, and computational guided strategy
Digital.CSIC. Repositorio Institucional del CSIC
- Baltasar-Marchueta, Maider
- Llona, Leire
- M-Alicante, Sara
- Barbolla, Iratxe
- García Ibarluzea, Markel
- Ramis, Rafael
- Salomon, Ane Miren
- Fundora, Brenda
- Araujo, Ariane
- Muguruza-Montero, Arantza
- Núñez, Eider
- Pérez-Olea, Scarlett
- Villanueva, Christian
- Leonardo, Aritz
- Arrasate, Sonia
- Sotomayor, Nuria
- Villarroel, Álvaro
- Bergara, Aitor
- Lete, Esther
- González-Díaz, Humberto
© 2024 The Authors. Published by Elsevier Masson SAS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)., The development of new molecules for the treatment of calmodulin related cardiovascular or neurodegenerative diseases is an interesting goal. In this work, we introduce a novel strategy with four main steps: (1) chemical synthesis of target molecules, (2) Förster Resonance Energy Transfer (FRET) biosensor development and in vitro biological assay of new derivatives, (3) Cheminformatics models development and in vivo activity prediction, and (4) Docking studies. This strategy is illustrated with a case study. Firstly, a series of 4-substituted Riluzole derivatives 1–3 were synthetized through a strategy that involves the construction of the 4-bromoriluzole framework and its further functionalization via palladium catalysis or organolithium chemistry. Next, a FRET biosensor for monitoring Ca2+-dependent CaM-ligands interactions has been developed and used for the in vitro assay of Riluzole derivatives. In particular, the best inhibition (80%) was observed for 4-methoxyphenylriluzole 2b. Besides, we trained and validated a new Networks Invariant, Information Fusion, Perturbation Theory, and Machine Learning (NIFPTML) model for predicting probability profiles of in vivo biological activity parameters in different regions of the brain. Next, we used this model to predict the in vivo activity of the compounds experimentally studied in vitro. Last, docking study conducted on Riluzole and its derivatives has provided valuable insights into their binding conformations with the target protein, involving calmodulin and the SK4 channel. This new combined strategy may be useful to reduce assay costs (animals, materials, time, and human resources) in the drug discovery process of calmodulin inhibitors., Basque Government / Eusko Jaurlaritza (IT1558–22, IT1707-22) and SPRI ELKARTEK grant (CardiCaM KK-2020/00110) are acknowledged for financial support. We also acknowledge Ministry of Science and Innovation (PID2019–104148GB-100, PID2021–128286NB-100, PID2022–137365NB-100 funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, including FEDER funds)., Peer reviewed
MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products
Digital.CSIC. Repositorio Institucional del CSIC
- Carracedo-Reboredo, Paula
- Aranzamendi, Eider
- He, Shan
- Arrasate, Sonia
- Munteanu, Cristian Robert
- Fernández-Lozano, Carlos
- Sotomayor, Nuria
- Lete, Esther
- González-Díaz, Humberto
MATEO web server was implemented for public use by experimental organic chemists, see link: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo.The code of the software was uploaded to a GitHub repository and is available free for use by cheminformatics researchers with MIT license. The links are the following. For the MATEO server code the link is: https://github.com/glezdiazh/MATEO. For libraries used to calculate the molecular descriptors the link is: https://github.com/muntisa/RMarkovTI.All data files (SI00, SI01, and SI02) have been uploaded to a public data repository and are available for use free of charge under universal commons creative license (CC0). The links are, SI00.pdf file link: https://doi.org/https://doi.org/10.6084/m9.figshare.21981740.v2, Additional file 2: https://doi.org/https://doi.org/10.6084/m9.figshare.21971690.v2, and Additional file 3: https://doi.org/https://doi.org/10.6084/m9.figshare.21971696.v2., The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difficult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R2 = 0.96 overall for training and validation series. It involved a Monte Carlo sampling of > 100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N-triflylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo . This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products., The authors acknowledge financial support from Grant PID2019-104148 GB-I00 and PID2022-137365NB-I00 funded by MCIN/ AEI/10.13039/501100011033 and Grant IT1558-22 funded by Basque Government/Eusko Jaurlaritza, 2022–2025.CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities to strengthen the research centers of the Galician University System (CIGUS)., Peer reviewed
DOI: http://hdl.handle.net/10261/372925, https://api.elsevier.com/content/abstract/scopus_id/85182865570
SI00 Experimental Section
Digital.CSIC. Repositorio Institucional del CSIC
- González-Díaz, Humberto
The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products (products) or chiral catalysts (tools). The enantioselectivity is sensitive to many factors, from the nature of the nucleophile and the catalyst to the experimental conditions (solvent, temperature, etc.). Although computational chemistry has been used to rationalize experimental results, it is still difficult to understand the influence of different parameters (solvent, temperature, etc.) on the quantitative reaction outcome (as yield or regio- and stereoselectivities).Both experimental and computational (Quantum Chemistry) study of a large number of reactions may become costly in terms of resources and time. Thus, the development of fast-track public computational tools to predict the enantioselectivity [enantiomeric excess ee(%)obs] would be very useful. Furthermore, making the new tool available online could save time and experimental resources in many labs worldwide. We used an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm to seek a predictive model with R2 = 0.91 in training and validation series has been developed. It involves a Monte Carlo sampling of>100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N-trifylphosphoramides as chiral catalysts is reported as a case of study. After validation of the model, it was implementedin a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization. This tool is available online at:https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo.This new user-friendly online computational tool may become useful to explore a large number of combinations of reactants, catalysts, and experimental conditions. This public tool would enable sustainable optimization of reaction conditions that could lead to the design of new catalysts, substrates, nucleophiles, and/or products., Ministry of Science and Innovation (PID2019-104148GB-I00); Basque Government / Eusko Jaurlaritza (IT1558-22), Peer reviewed
Drug Release Nanoparticle System Design: Data Set Compilationand Machine Learning Modeling
Digital.CSIC. Repositorio Institucional del CSIC
- He, Shan
- Barón, Ander
- Munteanu, Cristian Robert
- Bilbao, Begoña de
- Casañola, Gerardo
- Chelu, Mariana
- Musuc, Adina Magdalena
- Bediaga, Harbil
- Ascencio, Estefania
- Castellanos-Rubio, Idoia
- Arrasate, Sonia
- Pazos, Alejandro
- Insausti, Maite
- Rasulev, Bakhtiyor
- González-Díaz, Humberto
The codes developed in this study are available at GitHub repository: https://github.com/muntisa/ML-for-HCC-DADNP., Magnetic nanoparticles (NPs) are gaining signifi-cant interest in the field of biomedical functional nanomaterialsbecause of their distinctive chemical and physical characteristics,particularly in drug delivery and magnetic hyperthermiaapplications. In this paper, we experimentally synthesized andcharacterized new Fe3O4-based NPs, functionalizing its surfacewith a 5-TAMRA cadaverine modified copolymer consisting ofPMAO and PEG. Despite these advancements, many combina-tions of NP cores and coatings remain unexplored. To address this,we created a new data set of NP systems from public sources.Herein, 11 different AI/ML algorithms were used to develop thepredictive AI/ML models. The linear discriminant analysis (LDA) and random forest (RF) models showed high values of sensitivityand specificity (>0.9) in training/validation series and 3-fold cross validation, respectively. The AI/ML models are able to predict 14output properties (CC50 (μM), EC50 (μM), inhibition (%), etc.) for all combinations of 54 different NP cores classes vs. 25 differentcoats and vs. 41 different cell lines, allowing the short listing of the best results for experimental assays. The results of this work mayhelp to reduce the cost of traditional trial and error procedures., The authors acknowledge financial support from Grants ELKARTEK (KK-2022/00032), 2022-2023, (KK-2023/00041), 2023-24 and IT1558-22, and IT1546-22, 2022-2025, funded by Basque Government/Eusko Jaurlaritza, Grant PID2019-104148GB-I00 and PID2022-136993OB-I00 funded by MCIN/AEI/10.13039/501100011033and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” and also Grant IKERDATA 2022/IKER/000040 funded by NextGenerationEU funds of European Commission. This work was also supported in part by the National Science Foundation NSF MRI award OAC-2019077. The authors are grateful for financial and administrative support provided by the Department of Coatings and Polymer Materials at North Dakota State University (USA). The authors would like to acknowledge as well the Spanish Ministry of Science and Innovation for financial support under grant No. PID2022-136993OB-I00 (AEI/FEDER, UE), funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union”. CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities to strengthen the research centers of the Galician University System (CIGUS). The authors thanks to the grant ED431C 2022/46 – Competitive Reference Groups (GRC) – funded by the EU and Xunta de Galicia (Spain). The authors would also like to acknowledge the financial support provided by the Basque Government under research projects IT1500-22, IT1546-22, and MMASINT (KK-2023/00041, Elkartek Program)., Peer reviewed
DOI: http://hdl.handle.net/10261/379556, https://api.elsevier.com/content/abstract/scopus_id/85214934314
Drug Release Nanoparticle Systems Design:Dataset Compilation and Machine Learning Modeling
Digital.CSIC. Repositorio Institucional del CSIC
- He, Shan
- Barón, Ander
- Munteanu, Cristian Robert
- Bilbao, Begoña de
- Bediaga, Harbil
- Casañola, Gerardo
- Ascencio, Estefania
- Chelu, Mariana
- Musuc, Adina Magdalena
- Arrasate, Sonia
- Pazos, Alejandro
- Rasulev, Bakhtiyor
- González-Díaz, Humberto
- Castellanos-Rubio, Idoia
- Insausti, Maite
Magnetic Nanoparticles (MNPs) are gaining significant interest in the field of biomedical functional nanomaterials because of their distinctive chemical and physical characteristics, particularly in drug delivery and magnetic hyperthermia applications. In this paper, we experimentally synthesized and characterized new Fe3O4 based MNPs, functionalizing its surface with a 5-TAMRA cadaverine modified copolymer consisting of PMAO and PEG. Despite these advancements, many combinations of NP cores and coatings remain unexplored. To address this, we created a new dataset of MNP systems from public sources. Herein 11 different AI/ML algorithms were used to develop the predictive AI/ML models. The Linear Discriminant Analysis (LDA) and Random Forest (RF) models showed high values of sensitivity and specificity (>0.9) in training/validation series and 3-fold cross validation, respectively. The AI/ML models are able to predict 14 output properties (CC50 (µM), EC50 (µM), Inhibition (%), etc.) for all combinations of 54 different NP cores classes vs. 15 different coats and vs. 41 different cell lines allowing to short list the best results for experimental assays. The results of this work may help to reduce the cost of the traditional trial and error procedures., ELKARTEK (KK-2022/00032); IT1558-22, 2022-2025; PID2019-104148GB-I00; 2022/IKER/000040; CITIC is funded by the Xunta de Galicia; ED431C 2022/46 – Competitive Reference Groups (GRC), Peer reviewed
AQUA Tox: A web tool for predicting aquatic toxicity in rotifer species using intrinsic explainable models
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- Casañola, Gerardo
- Torres, Roldán
- Rasulev, Bakhtiyor
- González-Díaz, Humberto
The widespread use of chemicals in various industries, including agriculture, cosmetics, pharmaceuticals, and textiles, poses significant environmental risks, particularly in aquatic ecosystems. This study focuses on the toxicity of organic compounds on two rotifer species, Brachionus calyciflorus and Brachionus plicatilis, widely used as bioindicators in ecotoxicology. A database of toxicity data (LC50) was compiled and QSAR/QSTR models were developed to predict chemical toxicity in both freshwater (FW) and saltwater (SW) environments. Using molecular descriptors, the study identified critical factors influencing toxicity, such as hydrophobicity and the presence of chlorine atoms. The models demonstrated strong predictive performance, with R² values exceeding 70 % for both FW and SW conditions. Key descriptors influencing toxicity included hydrophobicity and chlorine content. The models demonstrated strong predictive performance, with R² values exceeding 70 %. A user-friendly web application was developed, enabling the scientific community to assess the aquatic toxicity of chemicals. This tool aids in the design of safer, more sustainable substances, facilitating regulatory compliance and minimizing environmental impacts. The findings highlight the importance of combining computational methods with technological applications for environmental protection., G.D.H acknowledges financial support from Grant PID2019-104148 GB-I00 and PID2022-137365NB-I00 funded by MCIN/ AEI/10.13039/501100011033 and Grant IT1558-22 funded by Basque Government/Eusko Jaurlaritza, 2022–2025., Peer reviewed
DOI: http://hdl.handle.net/10261/388751, https://api.elsevier.com/content/abstract/scopus_id/105001343662
First report on Quantitative Structure-Toxicity Relationship modeling approaches for the prediction of acute toxicity of various organic chemicals against rotifer species
Digital.CSIC. Repositorio Institucional del CSIC
- Diéguez, Karel
- Casañola, Gerardo
- Torres, Roldán
- Rasulev, Bakhtiyor
- González-Díaz, Humberto
Nowadays, organic chemicals are crucial components in virtually every aspect of daily life, serving as indispensable elements for modern society. The ongoing synthesis of chemicals and the various potential harmful effects on living organisms are prompting regulatory bodies to view computational approaches as vital supplements and alternatives to traditional animal testing in assessing chemical risks. In this study, we have developed, for the first time, Quantitative Structure-Toxicity Relationship (QSTR) models based on Multiple Linear Regression (MLR) and five Machine Learning (ML) algorithms to predict organic chemical toxicity against a rotifer species (Brachionus calyciflorus). The most influential descriptors included in the MLR model are (SM6_B(p), B07[ClCl], B05[ClCl], MaxssCH2, F09[NO], B04[ClCl], and minssO), with positive contributions to the dependent variable (negative decimal logarithm of median lethal concentration at 24 h). The interpretation of the molecular descriptors of the MLR model suggested that substances with high molecular polarizability and lipophilicity (presence of chlorine atoms) positively influence and increase their toxic potency. The analysis of the application domain, conducted using the leverage approach and the standardized residual method, showcased the extensive applicability of each model. In the cross-validation, the best values are presented by Support Vector Regression (SV_R), a value of Q2Loo = 0.754 and RMSEcv = 0.652, which are slightly higher than the results of the other linear and nonlinear techniques used. Furthermore, our research exhibited a high degree of fitness, internal robustness, and external predictive power. These findings suggest that the developed QSTR models are well-suited for the reliable prediction of aquatic toxicity for a wide range of structurally diverse organic chemicals. These models can be valuable for tasks such as screening, prioritizing new compounds, filling data gaps, and mitigating the limitations associated with in vivo and in vitro tests, ultimately contributing to the reduction of the use of dangerous chemicals in the environment., G.D.H. acknowledges financial support from Grant PID2019-104148 GB-I00 and PID2022-137365NB-I00 funded by MCIN/AEI/10.13039/501100011033, and Grant IT1558-22 funded by Basque Government/Eusko Jaurlaritza (2022–2025). This work used resources of the Center for Computationally Assisted Science and Technology (CCAST) at North Dakota State University, supported by NSF MRI Award No. 2019077., Peer reviewed
DOI: http://hdl.handle.net/10261/388761, https://api.elsevier.com/content/abstract/scopus_id/105002148558