ESTADISTICA ESPACIO-TEMPORAL PARA LA RESOLUCION DE PROBLEMAS EN SALUD PUBLICA Y TELEDETECCION

PID2020-113125RB-I00

Nombre agencia financiadora Agencia Estatal de Investigación
Acrónimo agencia financiadora AEI
Programa Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Subprograma Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Convocatoria Proyectos I+D
Año convocatoria 2020
Unidad de gestión Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020
Centro beneficiario UNIVERSIDAD PUBLICA DE NAVARRA
Identificador persistente http://dx.doi.org/10.13039/501100011033

Publicaciones

Found(s) 18 result(s)
Found(s) 1 page(s)

Estimating LOCP cancer mortality rates in small domains in Spain using its relationship with lung cancer

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Retegui Goñi, Garazi
  • Etxeberria Andueza, Jaione
  • Ugarte Martínez, María Dolores
The distribution of lip, oral cavity, and pharynx (LOCP) cancer mortality rates in small domains (defined as the combination of province, age group, and gender) remains unknown in Spain. As many of the LOCP risk factors are preventable, specific prevention programmes could be implemented but this requires a clear specification of the target population. This paper provides an in-depth description of LOCP mortality rates by province, age group and gender, giving a complete overview of the disease. This study also presents a methodological challenge. As the number of LOCP cancer cases in small domains (province, age groups and gender) is scarce, univariate spatial models do not provide reliable results or are even impossible to fit. In view of the close link between LOCP and lung cancer, we consider analyzing them jointly by using shared component models. These models allow information-borrowing among diseases, ultimately providing the analysis of cancer sites with few cases at a very disaggregated level. Results show that males have higher mortality rates than females and these rates increase with age. Regions located in the north of Spain show the highest LOCP cancer mortality rates., The work was supported by Project MTM2017-82553-R (AEI, UE), Project PID2020-113125RB-I00/MCIN/ AEI/10.13039/501100011033 and Proyecto Jóvenes Investigadores PJUPNA2018-11.




Machine learning procedures for daily interpolation of rainfall in Navarre (Spain)

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Ugarte Martínez, María Dolores
  • Pérez Goya, Unai
Kriging is by far the most well known and widely used statistical method
for interpolating data in spatial random fields. The main reason is that it provides
the best linear unbiased predictor and it is an exact interpolator when normality is
assumed. The robustness of this method allows small departures from normality, however, many meteorological, pollutant and environmental variables have extremely
asymmetrical distributions and Kriging cannot be used. Machine learning techniques
such as neural networks, random forest, and k-nearest neighbor can be used instead,
because they do not require specific distributional assumptions. The drawback is that
they do not take account of the spatial dependence, and for an optimal performance
in spatial random fields more complex machine learning techniques could be considered. These techniques also require a relatively large amount of training data and
they are computationally challenging to implement. For a reduced number of observations, we illustrate the performance of the aforementioned procedures using daily
rainfall data of manual meteorological gauge stations in Navarre, where the only
auxiliary variables available are the spatial coordinates and the altitude. The quality
of the predictions is carefully checked through three versions of the relative root
mean squared error (RRMSE). The conclusion is that when we cannot use Kriging,
random forest and neural networks outperform k-nearest neighbor technique, and
provide reliable predictions of rainfall daily data with scarce auxiliary information., This research was supported by the Spanish Research Agency (PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033 project). It has also received funding from la Caixa Foundation (ID1000010434), Caja Navarra Foundation, and UNED Pamplona, under agreement LCF/PR/PR15/51100007.




Unpaired spatio-temporal fusion of image patches (USTFIP) from cloud covered images

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Goyena Baroja, Harkaitz
  • Pérez Goya, Unai
  • Montesino San Martín, Manuel
  • Militino, Ana F.
  • Wang, Qunming
  • Atkinson, Peter M.
  • Ugarte Martínez, María Dolores
Spatio-temporal image fusion aims to increase the frequency and resolution of multispectral satellite sensor images in a cost-effective manner. However, practical constraints on input data requirements and computational cost prevent a wider adoption of these methods in real case-studies. We propose an ensemble of strategies to eliminate the need for cloud-free matching pairs of satellite sensor images. The new methodology called Unpaired Spatio-Temporal Fusion of Image Patches (USTFIP) is tested in situations where classical requirements are progressively difficult to meet. Overall, the study shows that USTFIP reduces the root mean square error by 2-to-13% relative to the state-of-the-art Fit-FC fusion method, due to an efficient use of the available information. Implementation of USTFIP through parallel computing saves up to 40% of the computational time required for Fit-FC., This research was supported by the Spanish Research Agency and
Next Generation EU (PDC2021-120796-I00 project) and by the Spanish
Research Agency (PID 2020-113125RB-I00/MCIN/AEI/10.13039/
501100011033 project).
This work was supported by the National Natural Science Foundation
of China under Grant 42222108.




Logistic regression versus XGBoost for detecting burned areas using satellite images

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Goyena Baroja, Harkaitz
  • Pérez Goya, Unai
  • Ugarte Martínez, María Dolores
Classical statistical methods prove advantageous for small datasets, whereas machine learning algorithms can excel with larger datasets. Our paper challenges this conventional wisdom by addressing a highly significant problem: the identification of burned areas through satellite imagery, that is a clear example of imbalanced data. The methods are illustrated in the North-Central Portugal and the North-West of Spain in October 2017 within a multi-temporal setting of satellite imagery. Daily satellite images are taken from Moderate Resolution Imaging Spectroradiometer (MODIS) products. Our analysis shows that a classical Logistic regression (LR) model competes on par, if not surpasses, a widely employed machine learning algorithm called the extreme gradient boosting algorithm (XGBoost) within this particular domain., Open Access funding provided by Universidad Pública de Navarra. This work has been funded by the project PID2020-113125RB-I00 of the Spanish Research Agency (MCIN/ AEI/10.13039/501100011033) and Ayudas predoctorales UPNA 2022-2023.




Identifying extreme COVID-19 mortality risks in English small areas: a disease cluster approach

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Adin Urtasun, Aritz
  • Congdon, P.
  • Santafé Rodrigo, Guzmán
  • Ugarte Martínez, María Dolores
The COVID-19 pandemic is having a huge impact worldwide and has highlighted the extent of health inequalities between countries but also in small areas within a country. Identifying areas with high mortality is important both of public health mitigation in COVID-19 outbreaks, and of longer term efforts to tackle social inequalities in health. In this paper we consider different statistical models and an extension of a recent method to analyze COVID-19 related mortality in English small areas during the first wave of the epidemic in the first half of 2020. We seek to identify hotspots, and where they are most geographically concentrated, taking account of observed area factors as well as spatial correlation and clustering in regression residuals, while also allowing for spatial discontinuities. Results show an excess of COVID-19 mortality cases in small areas surrounding London and in other small areas in North-East and and North-West of England. Models alleviating spatial confounding show ethnic isolation, air quality and area morbidity covariates having a significant and broadly similar impact on COVID-19 mortality, whereas nursing home location seems to be slightly less important., This work has been supported by Projects MTM2017-82553-R (AEI/FEDER, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033). Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.




A scalable approach for short-term disease forecasting in high spatial resolution areal data

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
  • Riebler, Andrea
  • Adin Urtasun, Aritz
  • Ugarte Martínez, María Dolores
Short-term disease forecasting at specific discrete spatial resolutions has become a high-impact decision-support tool in health planning. However, when the number of areas is very large obtaining predictions can be computationally intensive or even unfeasible using standard spatiotemporal models. The purpose of this paper is to provide a method for short-term predictions in high-dimensional areal data based on a newly proposed ¿divide-and-conquer¿ approach. We assess the predictive performance of this method and other classical spatiotemporal models in a validation study that uses cancer mortality data for the 7907 municipalities of continental Spain. The new proposal outperforms traditional models in terms of mean absolute error, root mean square error, and interval score when forecasting cancer mortality 1, 2, and 3 years ahead. Models are implemented in a fully Bayesian framework using the well-known integrated nested Laplace estimation technique., This research has been supported by the project PID2020-113125RB- I00/MCIN/AEI/10.13039/501100011033 (principal investigator: M.Dolores Ugarte). Open access funding provided by Universidad Pública de Navarra.




Big problems in spatio-temporal disease mapping: methods and software

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
  • Adin Urtasun, Aritz
  • Ugarte Martínez, María Dolores
Background and objective: Fitting spatio-temporal models for areal data is crucial in many fields such as cancer epidemiology. However, when data sets are very large, many issues arise. The main objective of this paper is to propose a general procedure to analyze high-dimensional spatio-temporal areal data, with special emphasis on mortality/incidence relative risk estimation.
Methods: We present a pragmatic and simple idea that permits hierarchical spatio-temporal models to be fitted when the number of small areas is very large. Model fitting is carried out using integrated nested Laplace approximations over a partition of the spatial domain. We also use parallel and distributed strategies to speed up computations in a setting where Bayesian model fitting is generally prohibitively time-consuming or even unfeasible. Results: Using simulated and real data, we show that our method outperforms classical global models. We implement the methods and algorithms that we develop in the open-source R package bigDM where specific vignettes have been included to facilitate the use of the methodology for non-expert users.
Conclusions: Our scalable methodology proposal provides reliable risk estimates when fitting Bayesian hierarchical spatio-temporal models for high-dimensional data., This research has been supported by the project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033. It has also been partially funded by the Public University of Navarra (project PJUPNA20001). Open access funding provided by Universidad Pública de Navarra.




Large-scale unsupervised spatio-temporal semantic analysis of vast regions from satellite images sequences

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Echegoyen Arruti, Carlos
  • Pérez, Aritz
  • Santafé Rodrigo, Guzmán
  • Pérez Goya, Unai
  • Ugarte Martínez, María Dolores
Temporal sequences of satellite images constitute a highly valuable and abundant resource for analyzing regions of interest. However, the automatic acquisition of knowledge on a large scale is a challenging task due to different factors such as the lack of precise labeled data, the definition and variability of the terrain entities, or the inherent complexity of the images and their fusion. In this context, we present a fully unsupervised and general methodology to conduct spatio-temporal taxonomies of large regions from sequences of satellite images. Our approach relies on a combination of deep embeddings and time series clustering to capture the semantic properties of the ground and its evolution over time, providing a comprehensive understanding of the region of interest. The proposed method is enhanced by a novel procedure specifically devised to refine the embedding and exploit the underlying spatio-temporal patterns. We use this methodology to conduct an in-depth analysis of a 220 km region in northern Spain in different settings. The results provide a broad and intuitive perspective of the land where large areas are connected in a compact and well-structured manner, mainly based on climatic, phytological, and hydrological factors., This work has been supported by Project PID2020-113125RBI00/MCIN/AEI/10.130 39/501100011033. Aritz Pérez has been supported by Basque Government through the Elkartek program and the BERC 2022-2025 program, and by the Ministry of Science and Innovation: BCAM Severo Ochoa accreditation CEX2021-001142-S/ MICIN/AEI/ 10.13039/ 501100011033. Open Access funding provided by Universidad Pública de Navarra.




Evaluating recent methods to overcome spatial confounding

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Urdangarin Iztueta, Arantxa
  • Goicoa Mangado, Tomás
  • Ugarte Martínez, María Dolores
The concept of spatial confounding is closely connected to spatial regression, although no general definition has been established. A generally accepted idea of spatial confounding in spatial regression models is the change in fixed effects estimates that may occur when spatially correlated random effects collinear with the covariate are included in the model. Different methods have been proposed to alleviate spatial confounding in spatial linear regression models, but it is not clear if they provide correct fixed effects estimates. In this article, we consider some of those proposals to alleviate spatial confounding such as restricted regression, the spatial+ model, and transformed Gaussian Markov random fields. The objective is to determine which one provides the best estimates of the fixed effects. Dowry death data in Uttar Pradesh in 2001, stomach cancer incidence data in Slovenia in the period 1995–2001 and lip cancer incidence data in Scotland between the years 1975–1980 are analyzed. Several simulation studies are conducted to evaluate the performance of the methods in different scenarios of spatial confounding. Results reflect that the spatial+ method seems to provide fixed effects estimates closest to the true value although standard errors could be inflated, Open Access funding provided by Universidad Pública de Navarra. This work has been supported
by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033.




Advances in the estimation of fixed effects in spatial models with random effects

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Urdangarin Iztueta, Arantxa
La representación cartográfica de enfermedades permite estimar indicadores de salud específicos para áreas geográficas dentro de una región de estudio. Aunque el objetivo principal suele ser proporcionar las tasas/riesgos de incidencia o mortalidad de enfermedades como el cáncer, existen otras aplicaciones. Por ejemplo, el análisis de crímenes contra las mujeres en India. La mayor parte de la investigación en la representación cartográfica de enfermedades usa modelos mixtos de Poisson jerárquicos bayesianos que incorporan la dependencia espacial o temporal para suavizar los riesgos y reducir la variabilidad de los estimadores clásicos de los riesgos como las razones de incidencia/mortalidad estandarizadas (RIE/RME). Sin embargo, los modelos de representación cartográfica de enfermedades tienen algunos inconvenientes. Aquí nos centramos en dos de estas limitaciones. En primer lugar, estos modelos en general no son identificables y se requieren restricciones en el proceso de estimación para obtener resultados razonables. El segundo problema es la confusión espacial y está relacionado con la inclusión de covariables en los modelos. Si las covariables tienen estructura espacial, su asociación con la respuesta puede no estimarse bien debido al sesgo y la inflación de la varianza.
El objetivo principal de esta tesis es doble. Por un lado, abordaremos la complejidad de incorporar restricciones de suma cero para resolver los problemas de identificación al ajustar modelos espacio-temporales ampliamente utilizados en la representación cartográfica de enfermedades utilizando NIMBLE (de Valpine et al., 2017), un sistema para crear modelos estadísticos en R que permite ajustar modelos jerárquicos bayesianos utilizando un sistema configurable de algoritmos MCMC. Por otro lado, nos centraremos en la confusión espacial, con el objetivo de proponer un método que garantice estimaciones adecuadas de efectos fijos. La presente tesis está dividida en cuatro capítulos diferentes. El primer capítulo proporciona una introducción general sobre los problemas que se van a bordar en esta tesis y el resto de los capítulos profundizan en esos problemas. Esta tesis se cierra con una sección final que resume los principales resultados e introduce algunas ideas para futuras investigaciones., Disease mapping focuses on estimating health indicators specific to geographical areas within a region of study. Though incidence or mortality risks/rates of diseases, such as cancer, have been the target, other applications exist. For example, the analysis of crimes against women in India. A large extent of the research in disease mapping is based on Bayesian hierarchical Poisson mixed models that borrow strength from space or time to smooth the risks and reduce the variability of classical risks estimators such as standardized incidence/mortality ratios (SIR/SMRs). However, disease mapping models are not free from inconveniences. Here we focus on two of such limitations. First, the models are not in general identifiable and constraints are required in the estimation process to obtain meaningful and interpretable results.
The second problem is the so called spatial confounding and it is related with the inclusion of covariates in the models. If the covariates are spatially structured, their relationship with the response may not be well estimated due to unacceptably large bias and variance inflation.
The main purpose of this thesis is two-fold. On the one hand, we will address the challenge of incorporating sum-to-zero constraints to mitigate identifiability problems when fitting well-known spatio-temporal disease mapping models using NIMBLE (de Valpine et al., 2017), a system for building statistical models from R that permits fitting Bayesian hierarchical models using a configurable system of MCMC algorithms. On the other hand, we will focus on alleviating spatial confounding, with the aim of proposing a method that guarantees unbiased fixed effects estimates. The current thesis is structured in four different chapters. The first chapter gives a general introduction about the issues addressed in this thesis and the rest of the chapters delve deeper into these problems. This thesis closes with a final section summarizing the main results and introducing some ideas for future research., This thesis has been supported by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033, Programa de Doctorado en Matemáticas y Estadística (RD 99/2011), Matematikako eta Estatistikako Doktoretza Programa (ED 99/2011)




Locally adaptive change-point detection (LACPD) with applications to environmental changes

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Moradi, Mohammad Mehdi
  • Montesino San Martín, Manuel
  • Ugarte Martínez, María Dolores
  • Militino, Ana F.
We propose an adaptive-sliding-window approach (LACPD) for the problem of change-point detection in a set of time-ordered observations. The proposed method is combined with sub-sampling techniques to compensate for the lack of enough data near the time series’ tails. Through a simulation study, we analyse its behaviour in the presence of an early/middle/late change-point in the mean, and compare its performance with some of the frequently used and recently developed change-point detection methods in terms of power, type I error probability, area under the ROC curves (AUC), absolute bias, variance, and root-mean-square error (RMSE). We conclude that LACPD outperforms other methods by maintaining a low type I error probability. Unlike some other methods, the performance of LACPD does not depend on the time index of change-points, and it generally has lower bias than other alternative methods. Moreover, in terms of variance and RMSE, it outperforms other methods when change-points are close to the time series’ tails, whereas it shows a similar (sometimes slightly poorer) performance as other methods when change-points are close to the middle of time series. Finally, we apply our proposal to two sets of real data: the well-known example of annual flow of the Nile river in Awsan, Egypt, from 1871 to 1970, and a novel remote sensing data application consisting of a 34-year time-series of satellite images of the Normalised Difference Vegetation Index in Wadi As-Sirham valley, Saudi Arabia, from 1986 to 2019. We conclude that LACPD shows a good performance in detecting the presence of a change as well as the time and magnitude of change in real conditions., This work has been supported by Project MTM2017-82553-R (AEI/ FEDER, UE), Project PID2020-113125RB-I00 (AEI) and the Caixa Foundation (ID1000010434), Caja Navarra Foundation, and UNED Pamplona, under Agreement LCF/PR/PR15/51100007.




A simplified spatial+ approach to mitigate spatial confounding in multivariate spatial areal models

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Urdangarin Iztueta, Arantxa
  • Goicoa Mangado, Tomás
  • Kneib, Thomas
  • Ugarte Martínez, María Dolores
Spatial areal models encounter the well-known and challenging problem of spatial confounding. This issue makes it arduous to distinguish between the impacts of observed covariates and spatial random effects. Despite previous research and various proposed methods to tackle this problem, finding a definitive solution remains elusive. In this paper, we propose a simplified version of the spatial+ approach that involves dividing the covariate into two components. One component captures large-scale spatial dependence, while the other accounts for short-scale dependence. This approach eliminates the need to separately fit spatial models for the covariates. We apply this method to analyse two forms of crimes against women, namely rapes and dowry deaths, in Uttar Pradesh, India, exploring their relationship with socio-demographic covariates. To evaluate the performance of the new approach, we conduct extensive simulation studies under different spatial confounding scenarios. The results demonstrate that the proposed method provides reliable estimates of fixed effects and posterior correlations between different responses., This work has been supported by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033.




Predicting cancer incidence in regions without population-based cancer registries using mortality

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Retegui Goñi, Garazi
  • Etxeberria Andueza, Jaione
  • Riebler, Andrea
  • Ugarte Martínez, María Dolores
Cancer incidence numbers are routinely recorded by national or regional population-based cancer registries (PBCRs). However, in most southern European countries, the local PBCRs cover only a fraction of the country. Therefore, national cancer incidence can be only obtained through estimation methods. In this paper, we predict incidence rates in areas without cancer registry using multivariate spatial models modelling jointly cancer incidence and mortality. To evaluate the proposal, we use cancer incidence and mortality data from all the German states. We also conduct a simulation study by mimicking the real case of Spain considering different scenarios depending on the similarity of spatial patterns between incidence and mortality, the levels of lethality, and varying the amount of incidence data available. The new proposal provides good interval estimates in regions without PBCRs and reduces the relative error in estimating national incidence compared to one of the most widely used methodologies., The work was supported by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/
501100011033, Proyecto Jóvenes Investigadores PJUPNA2018-11 and Ayudas Predoctorales
Santander UPNA 2021-2022.




Exploring disease mapping models in big data contexts: some new proposals

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
La representación cartográfica de enfermedades es un área de investigación muy
relevante y significativa dentro del campo de la estadística espacial (datos de área),
ya que ofrece un apoyo muy importante para la toma de decisiones en materia de
salud pública. Debido a la gran variabilidad de los estimadores de riesgo clásicos,
como la razón de mortalidad estandarizada (RME), el uso de modelos estadísticos
complejos resulta esencial para obtener una representación más coherente del riesgo
de enfermedad subyacente. Durante las últimas décadas se han propuesto en la
literatura varios modelos estadísticos para suavizar riesgos espacio-temporales, la
mayoría de ellos considerando modelos que incorporan efectos aleatorios con distribuciones
a priori condicionales autorregresivas (CAR), basándose en el trabajo
seminal de Besag et al. (1991). Sin embargo, la escalabilidad de estos modelos,
concretamente su viabilidad en escenarios en los que el número de áreas pequeñas
aumenta significativamente, no ha sido estudiada suficientemente. Por lo tanto, el
principal objetivo de esta tesis es proponer nuevos métodos de modelización bayesiana
escalables para suavizar riesgos (o tasas) de incidencia/mortalidad en datos de área
espaciales y espacio-temporales de alta dimensión. La metodología está basada en el
principio de “divide y vencerás”. La presente tesis aborda en concreto los objetivos
descritos a continuación. El primer objetivo es revisar la bibliografía más reciente acerca de las principales
aportaciones en el ámbito espacial y espacio-temporal que son relevantes para los
objetivos de esta investigación. El capítulo 1 ofrece una visión general del ajuste y la
inferencia de modelos, centrándose en la técnica INLA, basada en aproximaciones
de Laplace anidadas e integración numérica, ampliamente utilizada para modelos
Gaussianos latentes dentro del paradigma Bayesiano (Rue et al., 2009). En este
capítulo también se proporcionan aproximaciones de criterios de selección de modelos
basados en la desviación Bayesiana (denominada deviance en inglés) y la distribución predictiva bajo las nuevas propuestas de modelos escalables. También se incluye una
breve descripción del paquete bigDM de R, que implementa todos los algoritmos y
modelos propuestos en esta disertación. El segundo objetivo de esta tesis es proponer un método de modelización Bayesiana
escalable para el tratamiento de datos de área espaciales de alta dimensión. En
el Capítulo 2, se facilita una descripción exhaustiva de una nueva metodología de
suavización de riesgos. También se lleva a cabo un estudio de simulación multiescenario
que incluye casi 8 000 municipios españoles para comparar el método
propuesto con un modelo global tipo CAR en términos de bondad de ajuste y precisión
en la estimación de la superficie de riesgos. Además, se ilustra el comportamiento de
los modelos escalables analizando datos de mortalidad por cáncer de colon y recto en
hombres para municipios españoles utilizando dos estrategias diferentes de partición
del dominio espacial. El tercer objetivo es ampliar el enfoque de modelización Bayesiana escalable para
suavizar riesgos de mortalidad o incidencia espacio-temporales de alta dimensión. En
el capítulo 3, se presenta una descripción exhaustiva de los modelos CAR espaciotemporales
propuestos originalmente por Knorr-Held (2000), que son la base de la
nueva propuesta de modelización para analizar datos de área espacio-temporales. El
capítulo también explica las estrategias de paralelización y computación distribuida
implementadas en el paquete bigDM para acelerar los cálculos mediante el uso del
paquete future (Bengtsson, 2021) de R. Se realiza un estudio de simulación para
comparar la nueva propuesta escalable con dos estrategias de fusión diferentes
frente a los modelos CAR espacio-temporales tradicionales utilizando el mapa de
los municipios españoles como plantilla. Además, se evalúa la nueva propuesta en
términos de tiempo computacional. Finalmente, se ilustran y comparan todos los
enfoques descritos en este capítulo analizando la evolución espacio-temporal de la
mortalidad por cáncer de pulmón en hombres en los municipios españoles durante el
periodo 1991-2015. El cuarto objetivo es evaluar la idoneidad del método desarrollado en el Capítulo
3 para la previsión a corto plazo de datos de alta resolución espacial. En el Capítulo
4, se presenta el modelo CAR espacio-temporal que incorpora observaciones faltantes
en la variable respuesta para los periodos de tiempo que se van a pronosticar. Adicionalmente,
se realiza un estudio de validación para evaluar la capacidad predictiva
de los modelos para predicciones a uno, dos y tres periodos utilizando datos reales
de mortalidad por cáncer de pulmón en municipios españoles. En este capítulo,
también se compara la capacidad predictiva de los modelos utilizando medidas de
validación cruzada (denominadas en inglés leave-one-out y leave-group-out) (Liu and
Rue, 2022). El quinto objetivo es transversal a todos los capítulos. El objetivo es desarrollar
un paquete en lenguaje R de código abierto llamado bigDM (Adin et al., 2023b) que consolida todos los métodos propuestos en esta disertación haciéndolos fácilmente
disponibles para su uso por la comunidad científica. La tesis finaliza con las principales conclusiones de este trabajo y detalla futuras
líneas de investigación., Disease mapping is a highly relevant and significant research area within the field
of spatial statistics (areal data), as it offers invaluable support for public health
decision-making. Due to the high variability of classical risk estimators, such as
the standardized mortality ratio (SMR), the use of statistical models becomes
essential to obtain a more consistent representation of the underlying disease risk.
During the last decades, several statistical models have been proposed in the disease
mapping literature for smoothing risks in space and time, most of them extending the
seminal work of Besag et al. (1991) based on conditional autoregressive (CAR) priors.
However, the scalability of these models, specifically their utility in scenarios where
the number of small areas increases significantly, has not been extensively studied.
Thus, the main purpose of this dissertation is to propose new scalable Bayesian
modelling methods to smooth incidence/mortality risks (or rates) in high-dimensional
spatial and spatio-temporal areal data based on the “divide-and-conquer” approach.
The current dissertation is developed with the following main objectives. The first objective is to review the literature about the main contributions of
spatial and spatio-temporal disease mapping that are relevant to the research goals.
Chapter 1 provides a general overview of model fitting and inference focusing on the
widely used integrated nested Laplace approximation (INLA) technique for latent
Gaussian models within the Bayesian paradigm (Rue et al., 2009). The chapter
also covers the description of how to compute approximations of model selection
criteria based on the deviance and the predictive distribution under our scalable
model proposals. A brief description of the R package bigDM is also included, which
implements all the algorithms and models proposed in this dissertation. The second objective of this dissertation is to propose a scalable Bayesian modelling
method for handling high-dimensional spatial count data. In Chapter 2, we
provide a comprehensive description of our novel risk smoothing method. We also conduct a multi-scenario simulation study involving nearly 8000 Spanish municipalities
to compare our proposed method with the well-known CAR models in
terms of goodness of fit and risk estimation accuracy. Additionally, we illustrate the
behaviour of the scalable models by analysing male colorectal cancer mortality data
from Spanish municipalities using two different partition strategies of the spatial
domain. The third objective is to extend our scalable Bayesian modelling approach for
smoothing mortality or incidence risks to analyze high-dimensional spatio-temporal
count data. In Chapter 3, we present a comprehensive description of the spatiotemporal
CAR models originally proposed by Knorr-Held (2000), which are the
basis of our new modelling proposal for analyzing spatio-temporal areal data. The
chapter also explains the parallel and distributed strategies implemented in the
bigDM package to speed up computations by using the R package future (Bengtsson,
2021). A simulation study is conducted to compare our new scalable proposal with
two different merging strategies against traditional spatio-temporal CAR models
using the map of the Spanish municipalities as a template. Additionally, we evaluate
our proposal in terms of computational time. Finally, we illustrate and compare all
the approaches described in this chapter by analyzing the spatio-temporal evolution
for male lung cancer mortality data in Spanish continental municipalities during the
period 1991-2015. The fourth objective is to assess the suitability of the method developed in
Chapter 3 for short-term forecasting in high spatial resolution data. In Chapter 4, we
present the spatio-temporal CAR model, which incorporates missing observations in
the response variable for the time periods to be forecasted. Additionally, a validation
study is conducted to assess the predictive ability of the models for one, two and
three periods ahead forecasting using real lung cancer mortality data in Spanish
municipalities. In this chapter, we also compare the predictive performance of the
models using scoring rules based on leave-one-out and leave-group-out cross-validation
strategies (Liu and Rue, 2022). The fifth objective is transversal to all chapters. The aim was to develop an
open-source R language package named bigDM (Adin et al., 2023b) that consolidates
all the methods proposed in this dissertation making them readily available for use
by the scientific community. The dissertation ends with the main conclusions and future research lines., This dissertation has been supported by Project MTM2017-82553-R (AEI/FEDER,
UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033. It has
also been partially funded by the Public University of Navarra (project PJUPNA2001),
and by la Caixa Foundation (ID 1000010434), Caja Navarra Foundation and UNED
Pamplona, under agreement LCF/PR/PR15/51100007 (project REF P/13/20)., Programa de Doctorado en Matemáticas y Estadística (RD 99/2011), Matematikako eta Estatistikako Doktoretza Programa (ED 99/2011)




Small area variations in non-affective first-episode psychosis: the role of socioeconomic and environmental factors

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Gutiérrez, Gerardo
  • Goicoa Mangado, Tomás
  • Ugarte Martínez, María Dolores
  • Aranguren Conde, Lidia
  • Corrales, Asier
  • Gil Berrozpe, Gustavo José
  • Librero, Julián
  • Sánchez Torres, Ana María
  • Peralta Martín, Víctor
  • García de Jalón, Elena
  • Cuesta, Manuel J.
  • Martínez, Matilde
  • Otero, María
  • Azcárate, Leire
  • Pereda, Nahia
  • Monclús, Fernando
  • Moreno, Laura
  • Fernández, Alba
  • Ariz, Mari Cruz
  • Sabaté, Alba
  • Aquerreta, Ainhoa
  • Aguirre, Izaskun
  • Lizarbe, Tadea
  • Begué, María José
Background: There is strong evidence supporting the association between environmental factors and increased risk of non-affective psychotic disorders. However, the use of sound statistical methods to account for spatial variations associated with environmental risk factors, such as urbanicity, migration, or deprivation, is scarce in the literature. Methods: We studied the geographical distribution of non-affective first-episode psychosis (NA-FEP) in a northern region of Spain (Navarra) during a 54-month period considering area-level socioeconomic indicators as putative explanatory variables. We used several Bayesian hierarchical Poisson models to smooth the standardized incidence ratios (SIR). We included neighborhood-level variables in the spatial models as covariates. Results: We identified 430 NA-FEP cases over a 54-month period for a population at risk of 365,213 inhabitants per year. NA-FEP incidence risks showed spatial patterning and a significant ecological association with the migrant population, unemployment, and consumption of anxiolytics and antidepressants. The high-risk areas corresponded mostly to peripheral urban regions; very few basic health sectors of rural areas emerged as high-risk areas in the spatial models with covariates. Discussion: Increased rates of unemployment, the migrant population, and consumption of anxiolytics and antidepressants showed significant associations linked to the spatial-geographic incidence of NA-FEP. These results may allow targeting geographical areas to provide preventive interventions that potentially address modifiable environmental risk factors for NA-FEP. Further investigation is needed to understand the mechanisms underlying the associations between environmental risk factors and the incidence of NA-FEP., This study was funded by a grant from the Carlos III Health Institute of the Ministry of Science and Innovation of the Government of Spain (PI19/01698). It was also funded by project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033.




Space-time interactions in bayesian disease mapping with recent tools: making things easier for practitioners

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Urdangarin Iztueta, Arantxa
  • Ugarte Martínez, María Dolores
  • Goicoa Mangado, Tomás
Spatio-temporal disease mapping studies the distribution of mortality or incidence risks in space and its evolution in time, and it usually relies on fitting hierarchical Poisson mixed models. These models are complex for practitioners as they generally require adding constraints to correctly identify and interpret the different model terms. However, including constraints may not be straightforward in some recent software packages. This paper focuses on NIMBLE, a library of algorithms that contains among others a configurable system for Markov chain Monte Carlo (MCMC) algorithms. In particular, we show how to fit different spatio-temporal disease mapping models with NIMBLE making emphasis on how to include sum-to-zero constraints to solve identifiability issues when including spatio-temporal interactions. Breast cancer mortality data in Spain during the period 1990-2010 is used for illustration purposes. A simulation study is also conducted to compare NIMBLE with R-INLA in terms of parameter estimates and relative risk estimation. The results are very similar but differences are observed in terms of computing time., This work has been supported by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033




Multivariate Bayesian models with flexible shared interactions for analyzing spatio-temporal patterns of rare cancers

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Retegui Goñi, Garazi
  • Etxeberria Andueza, Jaione
  • Ugarte Martínez, María Dolores
Rare cancers afect millions of people worldwide each year. However, estimating
incidence or mortality rates associated with rare cancers presents important
difculties and poses new statistical methodological challenges. In this paper,
we expand the collection of multivariate spatio-temporal models by introducing
adaptable shared spatio-temporal components to enable a comprehensive analysis
of both incidence and cancer mortality in rare cancer cases. These models allow the
modulation of spatio-temporal efects between incidence and mortality, allowing for
changes in their relationship over time. The new models have been implemented in
INLA using r-generic constructions. We conduct a simulation study to evaluate the
performance of the new spatio-temporal models. Our results show that multivariate
spatio-temporal models incorporating a fexible shared spatio-temporal term
outperform conventional multivariate spatio-temporal models that include specifc
spatio-temporal efects for each health outcome. We use these models to analyze
incidence and mortality data for pancreatic cancer and leukaemia among males
across 142 administrative health care districts of Great Britain over a span of nine
biennial periods (2002-2019), The work was supported by Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033, Project UNEDPAM/PI/PR24/05A and Ayudas Predoctorales Santander UPNA 2021-2022. Open Access funding provided by Universidad Pública de Navarra




Multivariate Bayesian spatio-temporal P-spline models to analyze crimes against women

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Vicente Fuenzalida, Gonzalo
  • Goicoa Mangado, Tomás
  • Ugarte Martínez, María Dolores
Univariate spatio-temporal models for areal count data have received great attention in recent years for estimating risks. However, models for studying multivariate responses are less commonly used mainly due to the computational burden. In this article, multivariate spatio-temporal P-spline models are proposed to study different forms of violence against women. Modeling distinct crimes jointly improves the precision of estimates over univariate models and allows to compute correlations among them. The correlation between the spatial and the temporal patterns may suggest connections among the different crimes that will certainly benefit a thorough comprehension of this problem that affects millions of women around the world. The models are fitted using integrated nested Laplace approximations and are used to analyze four distinct crimes against women at district level in the Indian state of Maharashtra during the period 2001-2013., Project MTM2017-82553-R (AEI/FEDER, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.130 39/501100011033.