NUEVAS PROPUESTAS PARA LA ESTIMACION, PREDICCION Y VALIDACION DE MODELOS SEMIPARAMETRICOS PARA EL ANALISIS DE DATOS COMPLEJOS CON APLICACIONES EN SALUD Y CAMBIO CLIMATICO

PID2020-115882RB-I00

Nombre agencia financiadora Agencia Estatal de Investigación
Acrónimo agencia financiadora AEI
Programa Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Subprograma Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Convocatoria Proyectos I+D
Año convocatoria 2020
Unidad de gestión Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020
Centro beneficiario ASOC BCAM - BASQUE CENTER FOR APPLIED MATHEMATICS
Identificador persistente http://dx.doi.org/10.13039/501100011033

Publicaciones

Resultados totales (Incluyendo duplicados): 19
Encontrada(s) 1 página(s)

Automatic cross-validation in structured models: is it time to leave out leave-one-out?

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Adin Urtasun, Aritz
  • Krainski, Elias Teixeira
  • Lenzi, Amanda
  • Liu, Zhedong
  • Martínez-Minaya, Joaquín
  • Rue, Håvard
Standard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model's prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data., Open access funding provided by Universidad Pública de Navarra. This research has been supported by project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033 for Adin, A., and by project PID2020-115882RB-I00 for Martínez-Minaya, J.




Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Zumeta-Olaskoaga, Lore
  • Weigert, Maximilian
  • Larruskain, Jon
  • Bikandi Latxaga, Eder
  • Setuain Chourraut, Igor
  • Lekue, Josean
  • Küchenhoff, Helmut
  • Lee, Dae-Jin
Data-based methods and statistical models are given special attention to the studyof sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models forthe prediction of occurring sports injuries, and to compare their performance withdifferent sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries,coupled with a large number of potentially influential variables. Hence, we conduct asimulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations areavailable. Methods that result in sparse models and favour interpretability, e.g. best subset selection and boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club., This research was supported by the Basque Government through the BERC Programme 2018–2021 by the Spanish Ministry of Science, Innovation and Universities MICINN and FEDER: BCAM Severo Ochoa excellence accreditation SEV-2017-0718, and project PID2020-115882RB-I00 funded by AEI/FEDER, UE and acronym ‘S3M1P4R’ and by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A.




Estimation of cut-off points under complex-sampling design data

Dipòsit Digital de Documents de la UAB
  • Iparragirre, Amaia|||0000-0002-0660-6535
  • Barrio Beraza, Irantzu|||0000-0003-0648-5769
  • Aramendi, Jorge
  • Arostegui, Inmaculada|||0000-0002-6848-2240
In the context of logistic regression models, a cut-off point is usually selected to dichotomize the estimated predicted probabilities based on the model. The techniques proposed to estimate optimal cut-off points in the literature, are commonly developed to be applied in simple random samples and their applicability to complex sampling designs could be limited. Therefore, in this work we propose a methodology to incorporate sampling weights in the estimation process of the optimal cut-off points, and we evaluate its performance using a real data-based simulation study. The results suggest the convenience of considering sampling weights for estimating optimal cut-off points.




Exploring green gentrification in 28 global North cities, the role of urban parks and other types of greenspaces

Dipòsit Digital de Documents de la UAB
  • Triguero-Mas, Margarita|||0000-0002-1580-2693
  • Anguelovski, Isabelle|||0000-0002-6409-5155
  • Connolly, James J. T.|||0000-0002-7363-8414
  • Martin, Nick|||0000-0001-9023-9696
  • Matheney, Austin|||0000-0003-0739-3129
  • Cole, Helen|||0000-0003-0936-6810
  • Pérez-Del-Pulgar, Carmen
  • García-Lamarca, Melissa|||0000-0002-4813-3633
  • Shokry, Galia|||0000-0002-2959-3677
  • Argüelles, Lucía|||0000-0003-1024-0289
  • Conesa, David|||0000-0002-5442-5691
  • Gallez, Elsa|||0000-0002-2198-2504
  • Sarzo, Blanca|||0000-0001-7305-6564
  • Beltrán, Miguel Angel
  • López máñez, Jesúa
  • Martínez-Minaya, Joaquín|||0000-0002-1016-8734
  • Oscilowicz, Emilia|||0000-0003-3153-4366
  • Arcaya, Mariana
  • Baró Porras, Francesc|||0000-0002-0145-6320
Although cities globally are increasingly mobilizing re-naturing projects to address diverse urban socio-environmental and health challenges, there is mounting evidence that these interventions may also be linked to the phenomenon known as green gentrification. However, to date the empirical evidence on the relationship between greenspaces and gentrification regarding associations with different greenspace types remains scarce. This study focused on 28 mid-sized cities in North America and Western Europe. We assessed improved access to different types of greenspace (i.e. total area of parks, gardens, nature preserves, recreational areas or greenways [i] added before the 2000s or [ii] added before the 2010s) and gentrification processes (including [i] gentrification for the 2000s; [ii] gentrification for the 2010s; [iii] gentrification throughout the decades of the 2000s and 2010s) in each small geographical unit of each city. To estimate the associations, we developed a Bayesian hierarchical spatial model foreach city and gentrification time period (i.e. a maximum of three models per city). More than half of our models showed that parks-together with other factors such as proximity to the city center-are positively associated with gentrification processes, particularly in the US context, except in historically Black disinvested postindustrial cities with lots of vacant land. We also find than in half of our models newly designated nature preserves are negatively associated with gentrification processes, particularly when considering gentrification throughout the 2000s and the 2010s and in the US. Meanwhile, for new gardens, recreational spaces and greenways, our research shows mixed results (some positive, some negative and some no effect associations). Considering the environmental and health benefits of urban re-naturing projects, cities should keep investing in improving park access while simultaneously implementing anti-displacement and inclusive green policies.




Automatic cross-validation in structured models: Is it time to leave out leave-one-out?

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • Adin, Aritz
  • Teixeira Krainski, Elias
  • Lenzi, Amanda
  • Liu, Zhedong
  • Rue, Haavard
  • Martínez Minaya, Joaquín
[EN] Standard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model's prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data., Open access funding provided by Universidad Publica de Navarra. This research has been supported by project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033 for Adin, A., and by project PID2020-115882RB-I00 for Martinez-Minaya, J. We would like to thank the valuable comments made by two anonymous reviewers that have contributed to clarify some aspects of this paper.




Impact of outdoor air pollution on severity and mortality in COVID-19 pneumonia

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • García-García, Fernando
  • Bronte, Olaia
  • Lee, Dae-Jin
  • Urrutia Landa, Isabel
  • Uranga, Ane
  • Nieves Ermecheo, Mónica
  • Quintana, José María
  • Arostegui, Inmaculada
  • Zalacain, Rafael
  • Ruiz-Iturriaga, Ainhize
  • Serrano, Laura
  • Menéndez, Rosario
  • Mendez, R
  • Torres, Antoni
  • Martínez Minaya, Joaquín
[EN] The relationship between exposure to air pollution and the severity of coronavirus disease 2019 (COVID-19) pneumo-nia and other outcomes is poorly understood. Beyond age and comorbidity, risk factors for adverse outcomes including death have been poorly studied. The main objective of our study was to examine the relationship between exposure to outdoor air pollution and the risk of death in patients with COVID-19 pneumonia using individual-level data. The sec-ondary objective was to investigate the impact of air pollutants on gas exchange and systemic inflammation in this dis-ease. This cohort study included 1548 patients hospitalised for COVID-19 pneumonia between February and May 2020 in one of four hospitals. Local agencies supplied daily data on environmental air pollutants (PM10, PM2.5, O3, NO2, NO and NOX) and meteorological conditions (temperature and humidity) in the year before hospital admission (from Jan-uary 2019 to December 2019). Daily exposure to pollution and meteorological conditions by individual postcode of residence was estimated using geospatial Bayesian generalised additive models. The influence of air pollution on pneumonia severity was studied using generalised additive models which included: age, sex, Charlson comorbidity index, hospital, average income, air temperature and humidity, and exposure to each pollutant. Additionally, general-ised additive models were generated for exploring the effect of air pollution on C-reactive protein (CRP) level and SpO2/FiO2 at admission. According to our results, both risk of COVID-19 death and CRP level increased significantly with median exposure to PM10, NO2, NO and NOX, while higher exposure to NO2, NO and NOX was associated with lower SpO2/FiO2 ratios. In conclusion, after controlling for socioeconomic, demographic and health-related variables, we found evidence of a significant positive relationship between air pollution and mortality in patients hospitalised for COVID-19 pneumonia. Additionally, inflammation (CRP) and gas exchange (SpO2/FiO2) in these patients were signif-icantly related to exposure to air pollution., This research work was partially funded by the Spanish Respiratory and Thoracic Surgery Association (SEPAR) [grant number 004-2021].This research was also partially funded by the Department of Education of the Basque Government through an Artificial Intelligence in BCAM grant [grant number 00432-2019], the Mathematical Modelling Applied to Health' strategy, the BERC 2018-2021 & amp; 2022-2025 programmes and the Consolidated Research Group MATHMODE [IT1456-22]; and by the Spanish Ministry of Science, Innovation and Universities under BCAM Severo Ochoa accreditation SEV-2017-0718, as well as by the Spanish State Research Agency (AEI) through project S3M1P4R [PID2020-115882RB-I00].




Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • Hayet-Otero, Miren
  • García-García, Fernando
  • Lee, Dae-Jin
  • España Yandiola, Pedro Pablo
  • Urrutia Landa, Isabel
  • Nieves Ermecheo, Mónica
  • Quintana, José María
  • Menéndez, Rosario
  • Torres, Antoni
  • Zalacain Jorge, Rafael
  • Arostegui, Inmaculada
  • Martínez Minaya, Joaquín
[EN] With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing > 60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (> 0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient's C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels -saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2-, the neutrophil-to-lymphocyte ratio (NLR) -to certain extent, also neutrophil and lymphocyte counts separately-, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives., This research is supported by the Spanish State Research Agency AEI under the project S3M1P4R PID2020-115882RB-I00, as well as by the Basque Government EJ-GV under the grant 'Artificial Intelligence in BCAM' 2019/00432, under the strategy 'Mathematical Modelling Applied to Health', and under the BERC 2018-2021 and 2022-2025 programmes, and also by the Spanish Ministry of Science and Innovation: BCAM Severo Ochoa accreditation CEX2021-001142-S/MICIN/AEI/10.13039/501100011033. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.




Cost-Sensitive Ordinal Classification Methods to Predict SARS-CoV-2 Pneumonia Severity

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • García-García, Fernando
  • Lee, Dae-Jin
  • España Yandiola, Pedro Pablo
  • Urrutia Landa, Isabel
  • Hayet-Otero, Miren
  • Ermecheo, Monica Nieves
  • Quintana, José María
  • Menéndez, Rosario
  • Torres, Antoni
  • Jorge, Rafael Zalacain
  • Martínez Minaya, Joaquín
[EN] Objective: To study the suitability of costsensitive ordinal artificial intelligence-machine learning (AIML) strategies in the prognosis of SARS-CoV-2 pneumonia severity.; Materials & methods: Observational, retrospective, longitudinal, cohort study in 4 hospitals in Spain. Information regarding demographic and clinical status was supplemented by socioeconomic data and air pollution exposures. We proposed AI-ML algorithms for ordinal classification via ordinal decomposition and for cost-sensitive learning via resampling techniques. For performancebased model selection, we defined a custom score including per-class sensitivities and asymmetric misprognosis costs. 260 distinct AI-ML models were evaluated via 10 repetitions of 5 x 5 nested cross-validation with hyperparameter tuning. Model selection was followed by the calibration of predicted probabilities. Final overall performance was compared against five well-established clinical severity scores and against a 'standard' (non-cost sensitive, non-ordinal) AI-ML baseline. In our best model, we also evaluated its explainability with respect to each of the input variables. Results: The study enrolled n = 1548 patients: 712 experienced low, 238 medium, and 598 high clinical severity. d = 131 variables were collected, becoming d' = 148 features after categorical encoding. Model selection resulted in our best-performing AI-ML pipeline having:; 1) no imputation of missing data,; 2) no feature selection (i.e. using the full set of d' features),; 3) 'Ordered Partitions' ordinal decomposition,; 4) cost-based reimbalance, and; 5) a Histogram-based Gradient Boosting classifier.; This best model (calibrated) obtained a median accuracy of 68.1% [67.3%, 68.8%] (95% confidence interval), a balanced accuracy of 57.0% [55.6%, 57.9%], and an overall area under the curve (AUC) 0.802 [0.795, 0.808]. In our dataset, it outperformed all five clinical severity scores and the 'standard' AI-ML baseline. Discussion & conclusion: We conducted an exhaustive exploration of AI-ML methods designed for both ordinal and cost-sensitive classification, motivated by a real-world application domain (clinical severity prognosis) in which these topics arise naturally. Our model with the best classification performance exploited successfully the ordering information of ground truth classes, coping with imbalance and asymmetric costs. However, these ordinal and cost-sensitive aspects are seldom explored in the literature., This research is supported by
the Spanish State Research Agency (AEI) under the project S3M1P4R
(PID2020-115882RB-I00), as well as by the Basque Government (EJGV) under the grant Artificial Intelligence in BCAM 2019/00432, under
the strategy Mathematical Modelling Applied to Health , and under the
BERC 2022 2025 programme, and also by the Spanish Ministry of
Science and Innovation: BCAM Severo Ochoa accreditation CEX2021-
001142-S / MICIN / AEI / 10.13039/501100011033.




A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • Martínez Minaya, Joaquín
  • Rue, Haavard
[EN] Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models. Moreover, we explore metrics like Deviance Information Criteria, Watanabe Akaike information criterion, and cross-validation measure conditional predictive ordinate for model selection in R-INLA for CoDa. Illustrating LNDM through two simulated examples and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases., Joaquin Martinez-Minaya gratefully acknowledges the Ministry of Science, Innovation and Universities (Spain) for research project PID2020-115882RB-I00. Joaquin Martinez-Minaya also acknowledges for Funding for open access charge: CRUE-Universitat Politecnica de Valencia.




A mesoscale analysis of relations between fish species richness and environmental and anthropogenic pressures in the Mediterranean Sea

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • Carmezim, Joao
  • Pennino, Maria Grazia
  • Conesa, David
  • Coll, Marta
  • Martínez Minaya, Joaquín
[EN] Although there is a great knowledge about individual anthropogenic threats to different fish species in the Mediterranean Sea, little is known about how these threats accumulate and interact to affect fish species richness in conjunction with environmental dynamics. This study assesses the role of these threats in the fish richness component and identifies the main areas where the interaction between fish species richness and threats is highest. Our results show that fish richness seems to be higher in saltier and colder areas where the chlorophyll-a and phosphate concentrations are lower. Among the anthropogenic threats analyzed, the costal impact and the fishing effort seems to be the more relevant ones. Overall areas with high fish richness are mainly located along the western and northern shores, with lower values in the south-eastern regions. Areas of potential high cumulative threats are widespread in both the western and eastern basins, with fewer areas located in the southeastern region. By describing the spatial patterns of the fish richness and which drivers explain these patterns we can also identify which anthropogenic activities can be managed more effectively to maintain and restore marine fish biodiversity in the basin., MGP to the project IMPRESS (RTI2018-099868-B-I00), ERDF, Ministry of Science, Innovation and Universities -State Research Agency. MC acknowledges partial funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 869300 (FutureMares project) and the Spanish project ProOceans (Ministerio de Ciencia e Innovaci ' on, Proyectos de I+D+I (RETOSPID2020-118097RB-I00). MC acknowledges the institutional support of the `Severo Ochoa Centre of Excellence' accreditation (CEX2019000928-S). D.C. would like to thank the Spanish Ministerio de Ciencia e Innovaci ' on -Agencia Estatal de Investigaci ' on for grant PID2019106341 GB-I00 (jointly financed by the European Regional Development Fund, FEDER). J.M.-M. would like to thank the Ministry of Science, Innovation and Universities for PID2020-115882RB-I00 research project.




Exploring green gentrification in 28 global North cities: the role of urban parks and other types of greenspaces

RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
  • Triguero-Mas, Margarita
  • Anguelovski, Isabelle
  • Connolly, James J T
  • Martin, Nick
  • Matheney, Austin
  • Cole, Helen V S
  • Pérez-Del-Pulgar, Carmen
  • García-Lamarca, Melissa
  • Shokry, Galia
  • Argüelles, Lucía
  • Conesa, David
  • Gallez, Elsa
  • Sarzo, Blanca
  • Beltran Sevilla, Miguel Angel
  • López Máñez, Jesúa
  • Oscilowicz, Emilia
  • Arcaya, Mariana C
  • Baró, Francesc
  • Martínez Minaya, Joaquín
[EN] Although cities globally are increasingly mobilizing re-naturing projects to address diverse urban socio-environmental and health challenges, there is mounting evidence that these interventions may also be linked to the phenomenon known as green gentrification. However, to date the empirical evidence on the relationship between greenspaces and gentrification regarding associations with different greenspace types remains scarce. This study focused on 28 mid-sized cities in North America and Western Europe. We assessed improved access to different types of greenspace (i.e. total area of parks, gardens, nature preserves, recreational areas or greenways [i] added before the 2000s or [ii] added before the 2010s) and gentrification processes (including [i] gentrification for the 2000s; [ii] gentrification for the 2010s; [iii] gentrification throughout the decades of the 2000s and 2010s) in each small geographical unit of each city. To estimate the associations, we developed a Bayesian hierarchical spatial model for each city and gentrification time period (i.e. a maximum of three models per city). More than half of our models showed that parks-together with other factors such as proximity to the city center-are positively associated with gentrification processes, particularly in the US context, except in historically Black disinvested postindustrial cities with lots of vacant land. We also find than in half of our models newly designated nature preserves are negatively associated with gentrification processes, particularly when considering gentrification throughout the 2000s and the 2010s and in the US. Meanwhile, for new gardens, recreational spaces and greenways, our research shows mixed results (some positive, some negative and some no effect associations). Considering the environmental and health benefits of urban re-naturing projects, cities should keep investing in improving park access while simultaneously implementing anti-displacement and inclusive green policies., The research presented in this paper received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program [grant agreement No. 678034], the Spanish Ministry of Science and Innovation [Maria de Maeztu, CEX2019-000940-M) and the Research Council VUB [SRP 16 Demographic challenges of the 21st century] but the sponsors had no role in the design or analysis of this study. MGL and LA are funded by Juan de la Cierva fellowships [IJC2020-046064-I, and IJC2020-045101-I] awarded by the Spanish Ministry of Economy and Competitiveness. HVSC is supported by the Banco SantanderUAB fellowship program. DC would like to thank the Spanish Ministerio de Ciencia e InnovacionAgencia Estatal de Investigacion for grant PID2019106341GB-I00 (jointly financed by the European Regional Development Fund, FEDER). BS was supported by Margarita Salas fellowship from the Spanish Ministry of Universities-University of Valencia (MS21-013). JM-M would like to thank the Ministry of Science, Innovation and Universities for PID2020115882RB-I00 grant.




Automated location of orofacial landmarks to characterize airway morphology in anaesthesia via deep convolutional neural networks

BIRD. BCAM's Institutional Repository Data
  • García, F.
  • Lee, D.J.
  • Mendoza-Garcés, F. J.
  • Irigoyen-Miró, S.
  • Legarreta-Olabarrieta, M. J.
  • García-Gutiérrez, S.
  • Arostegui, I.
Background:A reliable anticipation of a difficult airway may notably enhance safety during anaesthesia. In current practice, clinicians use bedside screenings by manual measurements of patients’ morphology.

Objective:To develop and evaluate algorithms for the automated extraction of orofacial landmarks, which characterize airway morphology.

Methods:We defined 27 frontal + 13 lateral landmarks. We collected n=317 pairs of pre-surgery photos from patients undergoing general anaesthesia (140 females, 177 males). As ground truth reference for supervised learning, landmarks were independently annotated by two anaesthesiologists.

We trained two ad-hoc deep convolutional neural network architectures based on InceptionResNetV2 (IRNet) and MobileNetV2 (MNet), to predict simultaneously: (a) whether each landmark is visible or not (occluded, out of frame), (b) its 2D-coordinates (x, y). We implemented successive stages of transfer learning, combined with data augmentation. We added custom top layers on top of these networks, whose weights were fully tuned for our application. Performance in landmark extraction was evaluated by 10-fold cross-validation (CV) and compared against 5 state-of-the-art deformable models.

Results:With annotators’ consensus as the ‘gold standard’, our IRNet-based network performed comparably to humans in the frontal view: median CV loss L=1.277·10-3, inter-quartile range (IQR) [1.001, 1.660]; versus median 1.360, IQR [1.172, 1.651], and median 1.352, IQR [1.172, 1.619], for each annotator against consensus, respectively. MNet yielded slightly worse results: median 1.471, IQR [1.139, 1.982].

In the lateral view, both networks attained performances statistically poorer than humans: median CV loss L=2.141·10-3, IQR [1.676, 2.915], and median 2.611, IQR [1.898, 3.535], respectively; versus median 1.507, IQR [1.188, 1.988], and median 1.442, IQR [1.147, 2.010] for both annotators. However, standardized effect sizes in CV loss were small: 0.0322 and 0.0235 (non-significant) for IRNet, 0.1431 and 0.1518 (p<0.05) for MNet; therefore quantitatively similar to humans.

The best performing state-of-the-art model (a deformable regularized Supervised Descent Method, SDM) behaved comparably to our DCNNs in the frontal scenario, but notoriously worse in the lateral view.

Conclusions:We successfully trained two DCNN models for the recognition of 27 + 13 orofacial landmarks pertaining to the airway. Using transfer learning and data augmentation, they were able to generalize without overfitting, reaching expert-like performances in CV. Our IRNet-based methodology achieved a satisfactory identification and location of landmarks: particularly in the frontal view, at the level of anaesthesiologists. In the lateral view, its performance decayed, although with a non-significant effect size. Independent authors had also reported lower lateral performances; as certain landmarks may not be clear salient points, even for a trained human eye., BERC.2022-2025
BCAM Severo Ochoa accreditation CEX2021-001142-S / MICIN / AEI / 10.13039/501100011033




Estimation of cut-off points under complex-sampling design data

BIRD. BCAM's Institutional Repository Data
  • Iparragirre, A.
  • Barrio, I.
  • Aramendi, J.
  • Arostegui, I.
In the context of logistic regression models, a cut-off point is usually selected to dichotomize the estimated predicted probabilities based on the model. The techniques proposed to estimate optimal cut-off points in the literature, are commonly developed to be applied in simple random samples and their applicability to complex sampling designs could be limited. Therefore, in this work we propose a methodology to incorporate sampling weights in the estimation process of the optimal cut-off points, and we evaluate its performance using a real data-based simulation study. The results suggest the convenience of considering sampling weights for estimating optimal cut-off points., IT1294-19
BERC 2018-2021
KK-2020/00049
PIF18/213




Variable selection with LASSO regression for complex survey data

BIRD. BCAM's Institutional Repository Data
  • Iparragirre, A.
  • Lumely, T.
  • Barrio, I.
  • Arostegui, I.
Variable selection is an important step to end up with good prediction models. LASSO regression
models are one of the most commonly used methods for this purpose, for which
cross-validation is the most widely applied validation technique to choose the tuning parameter
(λ). Validation techniques in a complex survey framework are closely related to
“replicate weights”. However, to our knowledge, they have never been used in a LASSO
regression context. Applying LASSO regression models to complex survey data could be
challenging. The goal of this paper is two-fold. On the one hand, we analyze the performance
of replicate weights methods to select the tuning parameter for fitting LASSO regression
models to complex survey data. On the other hand, we propose new replicate weights methods
for the same purpose. In particular, we propose a new design-based cross-validation
method as a combination of the traditional cross-validation and replicate weights. The performance
of all these methods has been analyzed and compared by means of an extensive
simulation study to the traditional cross-validation technique to select the tuning parameter
for LASSO regression models. The results suggest a considerable improvement when
the new proposal design-based cross-validation is used instead of the traditional crossvalidation., IT1456-22
PIF18/213




Age or lifestyle-induced accumulation of genotoxicity is associated with a length-dependent decrease in gene expression

BIRD. BCAM's Institutional Repository Data
  • Ibañez-Solé, O.
  • Barrio, I.
  • Izeta, A.
DNA damage has long been advocated as a molecular driver of aging. DNA dam-
age occurs in a stochastic manner, and is therefore more likely to accumulate in longer genes. The length-dependent accumulation of transcription-blocking damage, unlike that of somatic mutations, should be reflected in gene expression
datasets of aging. We analyzed gene expression as a function of gene length in several single-cell RNA sequencing datasets of mouse and human aging. We found a pervasive age-associated length-dependent underexpression of genes
across species, tissues, and cell types. Furthermore, we observed length-dependent underexpression associated with UV-radiation and smoke exposure, and in progeroid diseases, Cockayne syndrome, and trichothiodystrophy. Finally, we
studied published gene sets showing global age-related changes. Genes underexpressed with aging were significantly longer than overexpressed genes. These data highlight a previously undetected hallmark of aging and show that accumulation of genotoxicity in long genes could lead to reduced RNA polymerase II processivity., This work was supported by grants from Instituto de Salud Carlos III (PI22/01247 and PI19/01621), co-funded by the European Union, and Diputación Foral de Gipuzkoa. OI-S received the support of a fellowship from “Programa Investigo” of Lanbide-Servicio Vasco de Empleo, co-funded by the European Union (NextGenerationEU), la Caixa Foundation (ID 100010434; code LCF/BQ/IN18/11660065), and from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 713673. The work of IB was financially supported in part by grants from the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco [IT1456-22] and by the Ministry of Science and Innovation through BCAM Severo Ochoa accreditation [CEX2021-001142-S/MICIN/AEI/10.13039/501100011033] and through project [PID2020-115882RB-I00/AEI/10.13039/501100011033] funded by Agencia Estatal de Investigación and acronym “S3M1P4R” and also by the Basque Government through the BERC 2022–2025 program.




Five-year follow-up mortality prognostic index for colorectal patients

BIRD. BCAM's Institutional Repository Data
  • Orive, M.
  • Barrio, I.
  • Lazaro, S.
  • Gonzalez, N.
  • Bare, M.
  • Fernandez-de-Larrea, N.
  • Cortajarena, S.
  • Bilbao, A.
  • Aguirre, U.
  • Quintana, J.M.
Purpose: To identify 5-year survival prognostic variables in patients with colorectal cancer (CRC) and to propose a survival prognostic score that also takes into account changes over time in the patient's health-related quality of life (HRQoL) status.

Methods: Prospective observational cohort study of CRC patients. We collected data from their diagnosis, intervention, and at 1, 2, 3, and 5 years following the index intervention, also collecting HRQoL data using the EuroQol-5D-5L (EQ-5D-5L), European Organization for Research and Treatment of Cancer's Quality of Life Questionnaire-Core 30 (EORTC-QLQ-C30), and Hospital Anxiety and Depression Scale (HADS) questionnaires. Multivariate Cox proportional models were used.

Results: We found predictors of mortality over the 5-year follow-up to be being older; being male; having a higher TNM stage; having a higher lymph node ratio; having a result of CRC surgery classified as R1 or R2; invasion of neighboring organs; having a higher score on the Charlson comorbidity index; having an ASA IV; and having worse scores, worse quality of life, on the EORTC and EQ-5D questionnaires, as compared to those with higher scores in each of those questionnaires respectively.

Conclusions: These results allow preventive and controlling measures to be established on long-term follow-up of these patients, based on a few easily measurable variables.

Implications for cancer survivors: Patients with colorectal cancer should be monitored more closely depending on the severity of their disease and comorbidities as well as the perceived health-related quality of life, and preventive measures should be established to prevent adverse outcomes and therefore to ensure that better treatment is received.

Trial registration: ClinicalTrials.gov identifier: NCT02488161, Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported in part by grants from the Instituto de Salud Carlos III and the European Regional Development Fund (PS09/00314, PS09/00910, PS09/00746, PS09/00805, PI09/90460, PI09/90490, PI09/90453, PI09/90441, PI09/90397); the Spanish Ministry of the Economy (PID2020-115738 GB-I00); the Departments of Health (2010111098) and Education, Language Policy and Culture (IT1456-22; IT1598-22; IT-1187–19) of the Basque Government; the Research Committee of Galdakao Hospital; the REDISSEC (Red de Investigación en Servicios de Salud en Enfermedades Crónicas) thematic network of the Instituto de Salud Carlos III; and the Department of Education of the Basque Government through the Consolidated Research Group MATHMODE (IT1456-22) and the Basque Government through BMTF “Mathematical Modeling Applied to Health” Project.




Clinical prediction rules for adverse evolution in patients with COVID-19 by the Omicron variant

BIRD. BCAM's Institutional Repository Data
  • Barrio, I.
  • España, P.P
  • Villanueva, A
  • Gascon, M
  • Larrea, N.
  • García-Gutiérrez, S.
  • Quintana, J.M.
  • Portuondo-Jimenez, J.
Objective: We identify factors related to SARS-CoV-2 infection linked to hospitalization, ICU admission, and mortality and develop clinical prediction rules.

Methods: Retrospective cohort study of 380,081 patients with SARS-CoV-2 infection from March 1, 2020 to January 9, 2022, including a subsample of 46,402 patients who attended Emergency Departments (EDs) having data on vital signs. For derivation and external validation of the prediction rule, two different periods were considered: before and after emergence of the Omicron variant, respectively. Data collected included sociodemographic data, COVID-19 vaccination status, baseline comorbidities and treatments, other background data and vital signs at triage at EDs. The predictive models for the EDs and the whole samples were developed using multivariate logistic regression models using Lasso penalization.

Results: In the multivariable models, common predictive factors of death among EDs patients were greater age; being male; having no vaccination, dementia; heart failure; liver and kidney disease; hemiplegia or paraplegia; coagulopathy; interstitial pulmonary disease; malignant tumors; use chronic systemic use of steroids, higher temperature, low O2 saturation and altered blood pressure-heart rate. The predictors of an adverse evolution were the same, with the exception of liver disease and the inclusion of cystic fibrosis. Similar predictors were found to be related to hospital admission, including liver disease, arterial hypertension, and basal prescription of immunosuppressants. Similarly, models for the whole sample, without vital signs, are presented.

Conclusions: We propose risk scales, based on basic information, easily-calculable, high-predictive that also function with the current Omicron variant and may help manage such patients in primary, emergency, and hospital care.

Keywords: COVID-19; Clinical decision rules; Health care; Outcome assessment; SARS-CoV-2.




Derivative curve estimation in longitudinal studies using P-splines

BIRD. BCAM's Institutional Repository Data
  • Hernández, M.A.
  • Lee, D.J.
  • Rodríguez-Álvarez, M.X.
  • Durbán, M.
The estimation of curve derivatives is of interest in many disciplines. It allows the extraction of important characteristics to gain insight about the underlying process. In the context of longitudinal data, the derivative allows the description of biological features of the individuals or finding change regions of interest. Although there are several approaches to estimate subject-specific curves and their derivatives, there are still open problems due to the complicated nature of these time course processes. In this article, we illustrate the use of P-spline models to estimate derivatives in the context of longitudinal data. We also propose a new penalty acting at the population and the subject-specific levels to address under-smoothing and boundary problems in derivative estimation. The practical performance of the proposal is evaluated through simulations, and comparisons with an alternative method are reported.
Finally, an application to longitudinal height measurements of 125 football players in a youth professional academy is presented, where the goal is to analyse their growth and maturity patterns over time., RYC2019-027534-I
The Medical Services of Athletic Club




Statistical Modelling for Recurrent Events in Sports Injury Research with Applications to Football InjuryData.

BIRD. BCAM's Institutional Repository Data
  • Zumeta, L.
Sports injuries stand as undesirable side effects of athletic participation, carrying serious consequencesfor athletes' health, their professional careers, and overall team performance. With the growing availability of data, there has been an increasing reliance on statistical models to monitor athletes' healthand mitigate injury risks.In this dissertation, our focus is on the statistical analysis of sports injury data, with an emphasis on the time-varying and recurrent nature of injury occurrences. We develop and assess suitable statistical modelling approaches to address specific research questions that arise in sports injury prevention research. We pursue three primary objectives: (a) identifying biomechanical risk factors using variableselection methods and shared frailty Cox models, (b) developing a flexible recurrent time-to-event approach to model the effects of training load on subsequent injuries, and (c) creating dedicated statistical tools through the open-source R software. These objectives are driven by interdisciplinary research, conducted in close collaboration with the Medical Services of Athletic Club, and are motivated by real-world applications. Namely, the work is based on three distinct data sets: the functional screening tests data, the external training load data, and the web-scraped football injury data. The statistical advancements developed contribute to ongoing efforts in sports injury prevention, providinginsights, methodologies, and accessible software implementations for sports medicine practitioners., This research was supported by the Spanish Ministry of Science and Innovation (MICINN) through the Severo Ochoa SEV-2017-0718 PRE2018-
084007 funding and the BCAM Severo Ochoa accreditation CEX2021-001142-
S/MICIN/AEI/10.13039/501100011033; by the Basque Government through the BERC
2018-2021 and BERC 2022-2025 programs, and the PRE_2021_2_0029 funding; and by the
AEI/FEDER, UE through the “S3M1P4R” PID2020-115882RB-I00 project.