NUEVOS RETOS EN LA MODELIZACION ESPACIO-TEMPORAL CON APLICACIONES EN SALUD, VIOLENCIA DE GENERO Y TELEDETECCION

MTM2017-82553-R

Nombre agencia financiadora Agencia Estatal de Investigación
Acrónimo agencia financiadora AEI
Programa Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Subprograma Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Convocatoria Retos Investigación: Proyectos I+D+i
Año convocatoria 2017
Unidad de gestión Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016
Centro beneficiario UNIVERSIDAD PUBLICA DE NAVARRA
Identificador persistente http://dx.doi.org/10.13039/501100011033

Publicaciones

Found(s) 22 result(s)
Found(s) 1 page(s)

Comments on: Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Goicoa Mangado, Tomás
This paper comments the article 'Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions', where the authors address the important topic of building very general models with interaction terms facing the relevant issue of identifiability., This work has been supported by Project MTM2017-82553-R (AEI, FEDER/UE).




Estimating LOCP cancer mortality rates in small domains in Spain using its relationship with lung cancer

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Retegui Goñi, Garazi
  • Etxeberria Andueza, Jaione
  • Ugarte Martínez, María Dolores
The distribution of lip, oral cavity, and pharynx (LOCP) cancer mortality rates in small domains (defined as the combination of province, age group, and gender) remains unknown in Spain. As many of the LOCP risk factors are preventable, specific prevention programmes could be implemented but this requires a clear specification of the target population. This paper provides an in-depth description of LOCP mortality rates by province, age group and gender, giving a complete overview of the disease. This study also presents a methodological challenge. As the number of LOCP cancer cases in small domains (province, age groups and gender) is scarce, univariate spatial models do not provide reliable results or are even impossible to fit. In view of the close link between LOCP and lung cancer, we consider analyzing them jointly by using shared component models. These models allow information-borrowing among diseases, ultimately providing the analysis of cancer sites with few cases at a very disaggregated level. Results show that males have higher mortality rates than females and these rates increase with age. Regions located in the north of Spain show the highest LOCP cancer mortality rates., The work was supported by Project MTM2017-82553-R (AEI, UE), Project PID2020-113125RB-I00/MCIN/ AEI/10.13039/501100011033 and Proyecto Jóvenes Investigadores PJUPNA2018-11.




Bayesian inference in multivariate spatio-temporal areal models using INLA: analysis of gender-based violence in small areas

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Vicente Fuenzalida, Gonzalo
  • Goicoa Mangado, Tomás
  • Ugarte Martínez, María Dolores
Multivariate models for spatial count data are currently receiving attention in disease mapping to model two or more diseases jointly. They have been thoroughly studied from a theoretical point of view, but their use in practice is still limited because they are computationally expensive and, in general, they are not implemented in standard software to be used routinely. Here, a new multivariate proposal, based on the recently derived M models for spatial data, is developed for spatio-temporal areal data. The model takes account of the correlation between the spatial and temporal patterns of the phenomena being studied, and it also includes spatio-temporal interactions. Though multivariate models have been traditionally fitted using Markov chain Monte Carlo techniques, here we propose to adopt integrated nested Laplace approximations to speed up computations as results obtained using both fitting techniques were nearly identical. The techniques are used to analyse two forms of crimes against women in India. In particular, we focus on the joint analysis of rapes and dowry deaths in Uttar Pradesh, the most populated Indian state, during the years 2001-2014., This work has been supported by Project MTM2017-82553-R (AEI/ FEDER, UE). It has also been partially funded by la Caixa Foundation (ID 1000010434), Caja Navarra Foundation, and UNED Pamplona, under agreement LCF/PR/PR15/51100007.




Univariate and multivariate spatio-temporal areal models to study crimes against women, Modelos de área espacio-temporales univariantes y multivariantes para estudiar crímenes contra las mujeres

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Vicente Fuenzalida, Gonzalo
Spatio-temporal models for areal data have been extensively applied in epidemiology
and public health to study geographical and temporal patterns of incidence or
mortality of several diseases, mainly cancer. The utility of these models has become
crucial in public health, and methodological research has evolved in line with the
necessity of analyzing the increasingly more complex data registers. However, these
techniques have not been used to study crimes against women, a complex and
intricate problem where risk factors are not clearly identified. This dissertation is
aimed at improving and developing methodology to disentangle the phenomenon of
crimes against women in general and in India in particular., Spanish Ministry of Economy, Industry and Competitiveness (project MTM2017-82553-R AEI/FEDER grants), the Government of
Navarre (projects PI015-2016 and PI043-2017), and La Caixa Foundation (ID
1000010434), Caja Navarra Foundation and UNED Pamplona, under agreement
LCF/PR/PR15/51100007. Universidad Nacional de Cuyo (Argentina), Programa de Doctorado en Matemáticas y Estadística (RD 99/2011), Matematikako eta Estatistikako Doktoretza Programa (ED 99/2011)




Identifying extreme COVID-19 mortality risks in English small areas: a disease cluster approach

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Adin Urtasun, Aritz
  • Congdon, P.
  • Santafé Rodrigo, Guzmán
  • Ugarte Martínez, María Dolores
The COVID-19 pandemic is having a huge impact worldwide and has highlighted the extent of health inequalities between countries but also in small areas within a country. Identifying areas with high mortality is important both of public health mitigation in COVID-19 outbreaks, and of longer term efforts to tackle social inequalities in health. In this paper we consider different statistical models and an extension of a recent method to analyze COVID-19 related mortality in English small areas during the first wave of the epidemic in the first half of 2020. We seek to identify hotspots, and where they are most geographically concentrated, taking account of observed area factors as well as spatial correlation and clustering in regression residuals, while also allowing for spatial discontinuities. Results show an excess of COVID-19 mortality cases in small areas surrounding London and in other small areas in North-East and and North-West of England. Models alleviating spatial confounding show ethnic isolation, air quality and area morbidity covariates having a significant and broadly similar impact on COVID-19 mortality, whereas nursing home location seems to be slightly less important., This work has been supported by Projects MTM2017-82553-R (AEI/FEDER, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033). Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.




Relative handgrip strength diminishes the negative effects of excess adiposity on dependence in older adults: a moderation analysis

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Ramírez Vélez, Robinson
  • Pérez Sousa, Miguel A.
  • García Hermoso, Antonio
  • Zambom Ferraresi, Fabrício
  • Martínez Velilla, Nicolás
  • López Sáez de Asteasu, Mikel
  • Cano Gutiérrez, Carlos Alberto
  • Rincón Pabón, David
  • Izquierdo Redín, Mikel
The adverse effects of fat mass on functional dependence might be attenuated or worsened, depending on the level of muscular strength. The aim of this study was to determine (i) the detrimental effect of excess adiposity on dependence in activities of daily living (ADL), and (ii) whether relative handgrip strength (HGS) moderates the adverse effect of excess adiposity on dependence, and to provide the threshold of relative HGS from which the adverse effect could be improved or worsened. A total of 4169 participants (69.3 +/- 7.0 years old) from 244 municipalities were selected following a multistage area probability sampling design. Measurements included anthropometric/adiposity markers (weight, height, body mass index, waist circumference, and waist-to-height ratio (WHtR)), HGS, sarcopenia 'proxy' (calf circumference), and ADL (Barthel Index scale). Moderation analyses were performed to identify associations between the independent variable (WHtR) and outcomes (dependence), as well as to determine whether relative HGS moderates the relationship between excess adiposity and dependence. The present study demonstrated that (i) the adverse effect of having a higher WHtR level on dependence in ADL was moderated by relative HGS, and (ii) two moderation thresholds of relative HGS were estimated: 0.35, below which the adverse effect of WHtR levels on dependency is aggravated, and 0.62, above which the adverse effect of fat on dependency could be improved. Because muscular strength represents a critically important and modifiable predictor of ADL, and the increase in adiposity is inherent in aging, our results underscore the importance of an optimal level of relative HGS in the older adult population., This study is part of a larger project that has been funded by a Colciencias y Ministerio de Salud y la Proteccion Social de Colombia (The SABE Study ID 2013, no. 764). M.I. is also funded in part by a research grant PI17/01814 of the Ministerio de Economia, Industria y Competitividad (ISCIII, FEDER). A.G.-H. is a Miguel Servet Fellow (Instituto de Salud Carlos III-FSE-CP18/0150). R.R.-V. was funded in part by a Postdoctoral Fellowship Resolution ID 420/2019 of the Universidad Publica de Navarra.




Alleviating confounding in spatio-temporal areal models with an application on crimes against women in India

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Adin Urtasun, Aritz
  • Goicoa Mangado, Tomás
  • Hodges, James S.
  • Schnell, Patrick M.
  • Ugarte Martínez, María Dolores
Assessing associations between a response of interest and a set of covariates in spatial areal models is the leitmotiv of ecological regression. However, the presence of spatially correlated random effects can mask or even bias estimates of such associations due to confounding effects if they are not carefully handled. Though potentially harmful, confounding issues have often been ignored in practice leading to wrong conclusions about the underlying associations between the response and the covariates. In spatio-temporal areal models, the temporal dimension may emerge as a new source of confounding, and the problem may be even worse. In this work, we propose two approaches to deal with confounding of fixed effects by spatial and temporal random effects, while obtaining good model predictions. In particular, restricted regression and an apparently—though in fact not—equivalent procedure using constraints are proposed within both fully Bayes and empirical Bayes approaches. The methods are compared in terms of fixed-effect estimates and model selection criteria. The techniques are used to assess the association between dowry deaths and certain socio-demographic covariates in the districts of Uttar Pradesh, India., This work has been supported by Project MTM2017-82553-R (AEI/FEDER, UE). It has also been partially funded by ‘la Caixa’ Foundation (ID 1000010434), Caja Navarra Foundation, and UNED Pamplona, under agreement LCF/PR/PR15/51100007.




Locally adaptive change-point detection (LACPD) with applications to environmental changes

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Moradi, Mohammad Mehdi
  • Montesino San Martín, Manuel
  • Ugarte Martínez, María Dolores
  • Militino, Ana F.
We propose an adaptive-sliding-window approach (LACPD) for the problem of change-point detection in a set of time-ordered observations. The proposed method is combined with sub-sampling techniques to compensate for the lack of enough data near the time series’ tails. Through a simulation study, we analyse its behaviour in the presence of an early/middle/late change-point in the mean, and compare its performance with some of the frequently used and recently developed change-point detection methods in terms of power, type I error probability, area under the ROC curves (AUC), absolute bias, variance, and root-mean-square error (RMSE). We conclude that LACPD outperforms other methods by maintaining a low type I error probability. Unlike some other methods, the performance of LACPD does not depend on the time index of change-points, and it generally has lower bias than other alternative methods. Moreover, in terms of variance and RMSE, it outperforms other methods when change-points are close to the time series’ tails, whereas it shows a similar (sometimes slightly poorer) performance as other methods when change-points are close to the middle of time series. Finally, we apply our proposal to two sets of real data: the well-known example of annual flow of the Nile river in Awsan, Egypt, from 1871 to 1970, and a novel remote sensing data application consisting of a 34-year time-series of satellite images of the Normalised Difference Vegetation Index in Wadi As-Sirham valley, Saudi Arabia, from 1986 to 2019. We conclude that LACPD shows a good performance in detecting the presence of a change as well as the time and magnitude of change in real conditions., This work has been supported by Project MTM2017-82553-R (AEI/ FEDER, UE), Project PID2020-113125RB-I00 (AEI) and the Caixa Foundation (ID1000010434), Caja Navarra Foundation, and UNED Pamplona, under Agreement LCF/PR/PR15/51100007.




Challenges in disease mapping: predicting cancer incidence and analyzing models’ smoothing

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Retegui Goñi, Garazi
La representación cartográfica de enfermedades tiene como objetivo estudiar los patrones geográficos y las tendencias temporales de incidencia y mortalidad de distintas enfermedades, principalmente no transmisibles, como el cáncer. Los modelos espacio-temporales para datos de área desempeñan un papel crucial en la descripción del impacto del cáncer en distintas poblaciones, permitiendo a los gestores sanitarios e investigadores formular estrategias de prevención, diagnóstico y tratamiento. Sin embargo, el análisis de datos de cáncer presenta varios retos. Por un lado, la falta de registros de incidencia del cáncer en determinadas zonas geográficas dificulta el análisis espacial o temporal de los patrones de incidencia del cáncer. Por otro lado, algunos tipos de cáncer, como los cánceres poco frecuentes, siguen sin estudiarse lo suficiente debido a la escasa disponibilidad de datos exhaustivos. Esta tesis está dedicada a mejorar y desarrollar metodología para abordar los retos asociados tanto con la estimación de la incidencia del cáncer en ausencia de registros como con el estudio de los cánceres poco frecuentes. La tesis pretende alcanzar los siguientes objetivos.
El primer objetivo consiste en examinar los retos asociados a los datos de cáncer y revisar los métodos estadísticos utilizados en la literatura para enfrentarse a estos retos.
En el Capítulo 1 se ofrece una introducción general a las problemáticas asociadas con los datos de cáncer para comprender la relevancia del problema.
El segundo objetivo de esta tesis es proponer nuevos modelos que permitan predecir las tasas de incidencia en áreas geográficas sin registro de cáncer y, en consecuencia, proporcionar estimaciones de la carga de cáncer a nivel nacional. En el Capítulo 2, utilizamos modelos espaciales multivariantes comúnmente empleados en la literatura del ámbito de la representación cartográfica de enfermedades para predecir la incidencia de cáncer, modelizando conjuntamente la incidencia y la mortalidad por cáncer.
El tercer objetivo es ampliar la colección de modelos espacio-temporales multivariantes mediante la introducción de interacciones compartidas adaptables que permitan mejorar el análisis conjunto de incidencia y mortalidad por cánceres raros.
En el Capítulo 3, se proporciona una descripción detallada del modelo propuesto. Estos modelos permiten la modulación de interacciones espacio-temporales entre incidencia y mortalidad, permitiendo cambios en su relación a lo largo del tiempo.
El cuarto objetivo es evaluar la eficacia del método desarrollado en el Capítulo 3 para la predicción a corto plazo de las tasas de incidencia de cáncer, al tiempo que se manejan datos perdidos en las series temporales dada la falta de registros de cáncer en determinadas áreas geográficas. En el Capítulo 4, se lleva a cabo un estudio de validación para evaluar la capacidad predictiva de los modelos tanto para la predicción a futuro como para la predicción de datos faltantes en determinadas áreas, utilizando datos de mortalidad por cáncer de pulmón de los distritos sanitarios administrativos de Inglaterra para la serie temporal que abarca de 2001 a 2019.
El quinto objetivo es ofrecer una visión global del suavizado inducida por los modelos espaciales univariantes. Estos modelos llevan implícito cierto grado de suavizado, en virtud del cual, para cualquier área concreta, las estimaciones empíricas de riesgo o incidencia se ajustan hacia una media adecuada o incorporan un suavizado basado en los vecinos. Por lo tanto, aunque la explicación del modelo puede ser el objetivo principal, es crucial examinar el efecto de suavizado de los modelos. Además, un prior espacial particular tiene parámetros y no se ha estudiado cómo la variación de estos parámetros afecta al suavizado inducido. El Capítulo 5 investiga, tanto teórica como empíricamente, el grado de suavizado conseguido por un modelo determinado.
El sexto objetivo, transversal a todos los capítulos, materializa nuestro firme compromiso con la reproducibilidad. El código desarrollado en esta tesis se encuentra disponible públicamente en el repositorio de GitHub de nuestro grupo de investigación https://github.com/spatialstatisticsupna. La tesis finaliza con las principales conclusiones y las líneas futuras de investigación., Disease mapping aims to study geographic patterns and temporal trends of incidence and mortality of different diseases, essentially non-transmissible, such as cancer. Spatio-temporal models for areal data play a crucial role in describing the cancer impact in different populations, helping governments, policy makers, health professionals, and researchers to formulate cost-effective prevention, diagnosis and treatment strategies. However, the analysis of cancer data presents several challenges.
On one hand, the lack of cancer incidence registries in certain geographical áreas makes the spatial or temporal analysis of cancer incidence patterns difficult. On the other hand, some cancer types, such as rare cancers, remain understudied due to the limited availability of comprehensive data. This thesis is dedicated to enhancing and developing methodologies to address the challenges associated with both cancer incidence data and the study of rare cancers. It aims to achieve the following primary objectives.
The first objective is to focus on challenges associated with cancer data collection and to review the statistical methods used in the literature to deal with these challenges. Chapter 1 provides a general introduction on cancer data to understand the relevance of the problem. This thesis’s second objective is to develop new models that can predict cancer incidence rates in geographic areas lacking cancer registries. This will subsequently allow for national-level cancer incidence estimates. In Chapter 2, we leverage multivariate spatial models commonly employed in the disease mapping literature to predict cancer incidence. The third objective aims to extend the collection of multivariate spatio-temporal models by incorporating adaptable shared interaction terms. This will facilitate a more comprehensive analysis of both incidence and mortality for rare cancer cases. In Chapter 3, a detailed description of the proposed models is provided. These models allow the modulation of spatio-temporal interactions between incidence and mortality, allowing for changes in their relationship over time. The fourth objective is to assess the effectiveness of the models developed in Chapter 3 for short-term forecasting of cancer incidence rates, while handling missing data within the time series given the lack of cancer registries in certain geographical areas. In Chapter 4, a validation study is conducted to assess the predictive ability of the models for both forecasting and predicting missing data, using lung cancer mortality data from England’s administrative healthcare districts for a period covering 2001 to 2019. The fifth objective is to provide a comprehensive overview of the smoothness induced by the spatial univariate models. Implicit in these models is some degree of smoothing, wherein, for any particular unit, empirical risk or incidence estimates are adjusted towards a suitable mean or incorporate neighbour-based smoothing. Hence, while model explanation may be the primary objective, it is crucial to scrutinize the smoothing effect of the models. Further, a particular smoother has parameters and there has been no study regarding how varying these parameters affects the induced smoothing. Chapter 5 investigates, both theoretically and empirically, the extent of smoothing achieved by a given model. The sixth objective is transversal to all chapters. We have a strong commitment with reproducibility, and the code developed in this thesis is publicly available at the GitHub of our research group (https://github.com/spatialstatisticsupna). The thesis ends with the main conclusions and future research lines., Ayudas Predoctorales Santander UPNA 2021-2022, Project MTM2017-82553-R (AEI, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033. Partially funded by the Public University of Navarre under Proyecto Jóvenes Investigadores PJUPNA2018-11., Programa de Doctorado en Matemáticas y Estadística (RD 99/2011), Matematikako eta Estatistikako Doktoretza Programa (ED 99/2011)




Improving the quality of satellite imagery based on ground-truth data from rain gauge stations

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Ugarte Martínez, María Dolores
  • Pérez Goya, Unai
Multitemporal imagery is by and large geometrically and radiometrically accurate, but the residual noise arising from removal clouds and other atmospheric and electronic effects can produce outliers that must be mitigated to properly exploit the remote sensing information. In this study, we show how ground-truth data from rain gauge stations can improve the quality of satellite imagery. To this end, a simulation study is conducted wherein different sizes of outlier outbreaks are spread and randomly introduced in the normalized difference vegetation index (NDVI) and the day and night land surface temperature (LST) of composite images from Navarre (Spain) between 2011 and 2015. To remove outliers, a new method called thin-plate splines with covariates (TpsWc) is proposed. This method consists of smoothing the median anomalies with a thin-plate spline model, whereby transformed ground-truth data are the external covariates of the model. The performance of the proposed method is measured with the square root of the mean square error (RMSE), calculated as the root of the pixel-by-pixel mean square differences between the original data and the predicted data with the TpsWc model and with a state-space model with and without covariates. The study shows that the use of ground-truth data reduces the RMSE in both the TpsWc model and the state-space model used for comparison purposes. The new method successfully removes the abnormal data while preserving the phenology of the raw data. The RMSE reduction percentage varies according to the derived variables (NDVI or LST), but reductions of up to 20% are achieved with the new proposal., This research was supported by the Spanish Ministry of Economy, Industry and Competitiveness (project MTM2017-82553-R) jointly financed with the European Regional Development Fund (FEDER), the Government of Navarre (PI015-2016 and PI043-2017 projects) and the Fundación CAN-Obra Social Caixa-UNED Pamplona 2016 and 2017.




Space-time analysis of ovarian cancer mortality rates by age groups in Spanish provinces (1989-2015)

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Trandafir, Paula Camelia
  • Adin Urtasun, Aritz
  • Ugarte Martínez, María Dolores
Background: Ovarian cancer is a silent and largely asymptomatic cancer, leading to late diagnosis and worse prognosis. The late-stage detection and low survival rates, makes the study of the space-time evolution of ovarian cancer particularly relevant. In addition, research of this cancer in small areas (like provinces or counties) is still scarce. Methods: The study presented here covers all ovarian cancer deaths for women over 50 years of age in the provinces of Spain during the period 1989-2015. Spatio-temporal models have been fitted to smooth ovarian cancer mortality rates in age groups [50,60), [60,70), [70,80), and [80,+), borrowing information from spatial and temporal neighbours. Model fitting and inference has been carried out using the Integrated Nested Laplace Approximation (INLA) technique. Results: Large differences in ovarian cancer mortality among the age groups have been found, with higher mortality rates in the older age groups. Striking differences are observed between northern and southern Spain. The global temporal trends (by age group) reveal that the evolution of ovarian cancer over the whole of Spain has remained nearly constant since the early 2000s. Conclusion: Differences in ovarian cancer mortality exist among the Spanish provinces, years, and age groups. As the exact causes of ovarian cancer remain unknown, spatio-temporal analyses by age groups are essential to discover inequalities in ovarian cancer mortality. Women over 60 years of age should be the focus of follow-up studies as the mortality rates remain constant since 2002. High-mortality provinces should also be monitored to look for specific risk factors., This research has been supported by the Spanish Ministry of Science and Innovation (project MTM 2017-82553-R (AEI/FEDER, UE).




Detecting change-points in the time series of surfaces occupied by pre-defined NDVI categories in continental Spain from 1981 to 2015

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Ugarte Martínez, María Dolores
  • Pérez Goya, Unai
The free access to satellite images since more than 40 years ago
has provoked a rapid increase of multitemporal derived information of remote
sensing data that should be summarized and analyzed for future inferences. In
particular, the study of trends and trend changes is of crucial interest in many
studies of phenology, climatology, agriculture, hydrology, geology or many
other environmental disciplines. Overall, the normalized dierence vegetation
index (NDVI), as a satellite derived variable, plays a crucial role because of its
usefulness for vegetation and landscape characterization, land use and land
cover mapping, environmental monitoring, climate change or crop prediction
models. Since the eighties, it can be retrieved all over the world from dierent
satellites. In this work we propose to analyze its temporal evolution, looking
for breakpoints or change-points in trends of the surfaces occupied by four
NDVI classications made in Spain from 1981 to 2015. The results show a
decrease of bare soils and semi-bare soils starting in the middle nineties or
before, and a slight increase of middle-vegetation and high-vegetation soils
starting in 1990 and 2000 respectively., This research was supported by the Spanish Ministry of Economy, Industry,
and Competitiveness (project MTM2017-82553-R) jointly financed with the European Regional
Development Fund (FEDER), the Government of Navarre (PI015-2016 and PI043-2017 projects)
and the Fundación CAN-Obra Social Caixa 2016.




Exploring disease mapping models in big data contexts: some new proposals

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
La representación cartográfica de enfermedades es un área de investigación muy
relevante y significativa dentro del campo de la estadística espacial (datos de área),
ya que ofrece un apoyo muy importante para la toma de decisiones en materia de
salud pública. Debido a la gran variabilidad de los estimadores de riesgo clásicos,
como la razón de mortalidad estandarizada (RME), el uso de modelos estadísticos
complejos resulta esencial para obtener una representación más coherente del riesgo
de enfermedad subyacente. Durante las últimas décadas se han propuesto en la
literatura varios modelos estadísticos para suavizar riesgos espacio-temporales, la
mayoría de ellos considerando modelos que incorporan efectos aleatorios con distribuciones
a priori condicionales autorregresivas (CAR), basándose en el trabajo
seminal de Besag et al. (1991). Sin embargo, la escalabilidad de estos modelos,
concretamente su viabilidad en escenarios en los que el número de áreas pequeñas
aumenta significativamente, no ha sido estudiada suficientemente. Por lo tanto, el
principal objetivo de esta tesis es proponer nuevos métodos de modelización bayesiana
escalables para suavizar riesgos (o tasas) de incidencia/mortalidad en datos de área
espaciales y espacio-temporales de alta dimensión. La metodología está basada en el
principio de “divide y vencerás”. La presente tesis aborda en concreto los objetivos
descritos a continuación. El primer objetivo es revisar la bibliografía más reciente acerca de las principales
aportaciones en el ámbito espacial y espacio-temporal que son relevantes para los
objetivos de esta investigación. El capítulo 1 ofrece una visión general del ajuste y la
inferencia de modelos, centrándose en la técnica INLA, basada en aproximaciones
de Laplace anidadas e integración numérica, ampliamente utilizada para modelos
Gaussianos latentes dentro del paradigma Bayesiano (Rue et al., 2009). En este
capítulo también se proporcionan aproximaciones de criterios de selección de modelos
basados en la desviación Bayesiana (denominada deviance en inglés) y la distribución predictiva bajo las nuevas propuestas de modelos escalables. También se incluye una
breve descripción del paquete bigDM de R, que implementa todos los algoritmos y
modelos propuestos en esta disertación. El segundo objetivo de esta tesis es proponer un método de modelización Bayesiana
escalable para el tratamiento de datos de área espaciales de alta dimensión. En
el Capítulo 2, se facilita una descripción exhaustiva de una nueva metodología de
suavización de riesgos. También se lleva a cabo un estudio de simulación multiescenario
que incluye casi 8 000 municipios españoles para comparar el método
propuesto con un modelo global tipo CAR en términos de bondad de ajuste y precisión
en la estimación de la superficie de riesgos. Además, se ilustra el comportamiento de
los modelos escalables analizando datos de mortalidad por cáncer de colon y recto en
hombres para municipios españoles utilizando dos estrategias diferentes de partición
del dominio espacial. El tercer objetivo es ampliar el enfoque de modelización Bayesiana escalable para
suavizar riesgos de mortalidad o incidencia espacio-temporales de alta dimensión. En
el capítulo 3, se presenta una descripción exhaustiva de los modelos CAR espaciotemporales
propuestos originalmente por Knorr-Held (2000), que son la base de la
nueva propuesta de modelización para analizar datos de área espacio-temporales. El
capítulo también explica las estrategias de paralelización y computación distribuida
implementadas en el paquete bigDM para acelerar los cálculos mediante el uso del
paquete future (Bengtsson, 2021) de R. Se realiza un estudio de simulación para
comparar la nueva propuesta escalable con dos estrategias de fusión diferentes
frente a los modelos CAR espacio-temporales tradicionales utilizando el mapa de
los municipios españoles como plantilla. Además, se evalúa la nueva propuesta en
términos de tiempo computacional. Finalmente, se ilustran y comparan todos los
enfoques descritos en este capítulo analizando la evolución espacio-temporal de la
mortalidad por cáncer de pulmón en hombres en los municipios españoles durante el
periodo 1991-2015. El cuarto objetivo es evaluar la idoneidad del método desarrollado en el Capítulo
3 para la previsión a corto plazo de datos de alta resolución espacial. En el Capítulo
4, se presenta el modelo CAR espacio-temporal que incorpora observaciones faltantes
en la variable respuesta para los periodos de tiempo que se van a pronosticar. Adicionalmente,
se realiza un estudio de validación para evaluar la capacidad predictiva
de los modelos para predicciones a uno, dos y tres periodos utilizando datos reales
de mortalidad por cáncer de pulmón en municipios españoles. En este capítulo,
también se compara la capacidad predictiva de los modelos utilizando medidas de
validación cruzada (denominadas en inglés leave-one-out y leave-group-out) (Liu and
Rue, 2022). El quinto objetivo es transversal a todos los capítulos. El objetivo es desarrollar
un paquete en lenguaje R de código abierto llamado bigDM (Adin et al., 2023b) que consolida todos los métodos propuestos en esta disertación haciéndolos fácilmente
disponibles para su uso por la comunidad científica. La tesis finaliza con las principales conclusiones de este trabajo y detalla futuras
líneas de investigación., Disease mapping is a highly relevant and significant research area within the field
of spatial statistics (areal data), as it offers invaluable support for public health
decision-making. Due to the high variability of classical risk estimators, such as
the standardized mortality ratio (SMR), the use of statistical models becomes
essential to obtain a more consistent representation of the underlying disease risk.
During the last decades, several statistical models have been proposed in the disease
mapping literature for smoothing risks in space and time, most of them extending the
seminal work of Besag et al. (1991) based on conditional autoregressive (CAR) priors.
However, the scalability of these models, specifically their utility in scenarios where
the number of small areas increases significantly, has not been extensively studied.
Thus, the main purpose of this dissertation is to propose new scalable Bayesian
modelling methods to smooth incidence/mortality risks (or rates) in high-dimensional
spatial and spatio-temporal areal data based on the “divide-and-conquer” approach.
The current dissertation is developed with the following main objectives. The first objective is to review the literature about the main contributions of
spatial and spatio-temporal disease mapping that are relevant to the research goals.
Chapter 1 provides a general overview of model fitting and inference focusing on the
widely used integrated nested Laplace approximation (INLA) technique for latent
Gaussian models within the Bayesian paradigm (Rue et al., 2009). The chapter
also covers the description of how to compute approximations of model selection
criteria based on the deviance and the predictive distribution under our scalable
model proposals. A brief description of the R package bigDM is also included, which
implements all the algorithms and models proposed in this dissertation. The second objective of this dissertation is to propose a scalable Bayesian modelling
method for handling high-dimensional spatial count data. In Chapter 2, we
provide a comprehensive description of our novel risk smoothing method. We also conduct a multi-scenario simulation study involving nearly 8000 Spanish municipalities
to compare our proposed method with the well-known CAR models in
terms of goodness of fit and risk estimation accuracy. Additionally, we illustrate the
behaviour of the scalable models by analysing male colorectal cancer mortality data
from Spanish municipalities using two different partition strategies of the spatial
domain. The third objective is to extend our scalable Bayesian modelling approach for
smoothing mortality or incidence risks to analyze high-dimensional spatio-temporal
count data. In Chapter 3, we present a comprehensive description of the spatiotemporal
CAR models originally proposed by Knorr-Held (2000), which are the
basis of our new modelling proposal for analyzing spatio-temporal areal data. The
chapter also explains the parallel and distributed strategies implemented in the
bigDM package to speed up computations by using the R package future (Bengtsson,
2021). A simulation study is conducted to compare our new scalable proposal with
two different merging strategies against traditional spatio-temporal CAR models
using the map of the Spanish municipalities as a template. Additionally, we evaluate
our proposal in terms of computational time. Finally, we illustrate and compare all
the approaches described in this chapter by analyzing the spatio-temporal evolution
for male lung cancer mortality data in Spanish continental municipalities during the
period 1991-2015. The fourth objective is to assess the suitability of the method developed in
Chapter 3 for short-term forecasting in high spatial resolution data. In Chapter 4, we
present the spatio-temporal CAR model, which incorporates missing observations in
the response variable for the time periods to be forecasted. Additionally, a validation
study is conducted to assess the predictive ability of the models for one, two and
three periods ahead forecasting using real lung cancer mortality data in Spanish
municipalities. In this chapter, we also compare the predictive performance of the
models using scoring rules based on leave-one-out and leave-group-out cross-validation
strategies (Liu and Rue, 2022). The fifth objective is transversal to all chapters. The aim was to develop an
open-source R language package named bigDM (Adin et al., 2023b) that consolidates
all the methods proposed in this dissertation making them readily available for use
by the scientific community. The dissertation ends with the main conclusions and future research lines., This dissertation has been supported by Project MTM2017-82553-R (AEI/FEDER,
UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033. It has
also been partially funded by the Public University of Navarra (project PJUPNA2001),
and by la Caixa Foundation (ID 1000010434), Caja Navarra Foundation and UNED
Pamplona, under agreement LCF/PR/PR15/51100007 (project REF P/13/20)., Programa de Doctorado en Matemáticas y Estadística (RD 99/2011), Matematikako eta Estatistikako Doktoretza Programa (ED 99/2011)




Two-level resolution of relative risk of dengue disease in a hyperendemic city of Colombia

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Adin Urtasun, Aritz
  • Martínez Bello, Daniel Adyro
  • López Quílez, Antonio
  • Ugarte Martínez, María Dolores
Risk maps of dengue disease offer to the public health officers a tool to model disease risk in space and time. We analyzed the geographical distribution of relative incidence risk of dengue disease in a high incidence city from Colombia, and its evolution in time during the period January 2009—December 2015, identifying regional effects at different levels of spatial aggregations. Cases of dengue disease were geocoded and spatially allocated to census sectors, and temporally aggregated by epidemiological periods. The census sectors are nested in administrative divisions defined as communes, configuring two levels of spatial aggregation for the dengue cases. Spatio-temporal models including census sector and commune-level spatially structured random effects were fitted to estimate dengue incidence relative risks using the integrated nested Laplace approximation (INLA) technique. The final selected model included two-level spatial random effects, a global structured temporal random effect, and a census sector-level interaction term. Risk maps by epidemiological period and risk profiles by census sector were generated from the modeling process, showing the transmission dynamics of the disease. All the census sectors in the city displayed high risk at some epidemiological period in the outbreak periods. Relative risk estimation of dengue disease using INLA offered a quick and powerful method for parameter estimation and inference., This work was supported by grants from the Spanish Ministry of Economy and Competitiveness (projects MTM2014-51992-R-MDU- and MTM2016-77501-P -ALQ-, jointly financed with the European Regional Development Fund), the Spanish Ministry of Economy, Industry, and Competitiveness (MTM2017-82553-R jointly financed with the European Regional Development Fund (FEDER). MDU, AA), and the Colombian Administrative Department of Science and Technology (grant 646-2014 for doctoral studies abroad) DAMB.




Multivariate Bayesian spatio-temporal P-spline models to analyze crimes against women

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Vicente Fuenzalida, Gonzalo
  • Goicoa Mangado, Tomás
  • Ugarte Martínez, María Dolores
Univariate spatio-temporal models for areal count data have received great attention in recent years for estimating risks. However, models for studying multivariate responses are less commonly used mainly due to the computational burden. In this article, multivariate spatio-temporal P-spline models are proposed to study different forms of violence against women. Modeling distinct crimes jointly improves the precision of estimates over univariate models and allows to compute correlations among them. The correlation between the spatial and the temporal patterns may suggest connections among the different crimes that will certainly benefit a thorough comprehension of this problem that affects millions of women around the world. The models are fitted using integrated nested Laplace approximations and are used to analyze four distinct crimes against women at district level in the Indian state of Maharashtra during the period 2001-2013., Project MTM2017-82553-R (AEI/FEDER, UE) and Project PID2020-113125RB-I00/MCIN/AEI/10.130 39/501100011033.




Scalable Bayesian modeling for smoothing disease mapping risks in large spatial data sets using INLA

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
  • Adin Urtasun, Aritz
  • Ugarte Martínez, María Dolores
Several methods have been proposed in the spatial statistics literature to analyse big data sets in continuous domains. However, new methods for analysing high-dimensional areal data are still scarce. Here, we propose a scalable Bayesian modelling approach for smoothing mortality (or incidence) risks in high-dimensional data, that is, when the number of small areas is very large. The method is implemented in the R add-on package bigDM and it is based on the idea of “divide and conquer“. Although this proposal could possibly be implemented using any Bayesian fitting technique, we use INLA here (integrated nested Laplace approximations) as it is now a well-known technique, computationally efficient, and easy for practitioners to handle. We analyse the proposal’s empirical performance in a comprehensive simulation study that considers two model-free settings. Finally, the methodology is applied to analyse male colorectal cancer mortality in Spanish municipalities showing its benefits with regard to the standard approach in terms of goodness of fit and computational time., This research has been supported by the Spanish Ministry of Science and Innovation (project
MTM 2017-82553-R (AEI/FEDER, UE)). It has also been partially funded by la Caixa Foundation, Spain
(ID 1000010434), Caja Navarra Foundation, Spain, and UNED Pamplona, Spain, under agreement
LCF/PR/PR15/51100007 (project REF P/13/20).




An introduction to the spatio-temporal analysis of satellite remote sensing data for geostatisticians

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Ugarte Martínez, María Dolores
  • Pérez Goya, Unai
Satellite remote sensing data have become available in meteorology, agriculture, forestry, geology, regional planning, hydrology or natural environment sciences since several decades ago, because satellites provide routinely high quality images with different temporal and spatial resolutions. Joining, combining or smoothing these images for a better quality of information is a challenge not always properly solved. In this regard, geostatistics, as the spatiotemporal stochastic techniques of georeferenced data, is a very helpful and powerful tool not enough explored in this area yet. Here, we analyze the current use of some of the geostatistical tools in satellite image analysis, and provide an introduction to this subject for potential researchers., This research was supported by the Spanish Ministry of Economy, Industry and Competitiveness (Project MTM2017-82553-R), the Government of Navarra (Project PI015, 2016 and Project PI043 2017), and by the Fundación Caja Navarra-UNED Pamplona (2016 and 2017).




Bayesian modeling approach in Big Data contexts: an application in spatial epidemiology

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Orozco Acosta, Erick
  • Adin Urtasun, Aritz
  • Ugarte Martínez, María Dolores
In this work we propose a novel scalable Bayesian modeling approach to smooth mortality risks borrowing information from neighbouring regions in high-dimensional spatial disease mapping contexts. The method is based on the well-known divide and conquer approach, so that the spatial domain is divided into D subregions where local spatial models can be fitted simultaneously. Model fitting and inference has been carried out using the integrated nested Laplace approximation (INLA) technique. Male colorectal cancer mortality data in the municipalities of continental Spain have been analyzed using the new model proposals. Results show that the new modeling approach is very competitive in terms of model fitting criteria when compared with a global spatial model, and it is computationally much more efficient., This work has been supported by Project MTM2017-82553-R (AEI/FEDER, UE)




Crime against women in India: unveiling spatial patterns and temporal trends of dowry deaths in the districts of Uttar Pradesh

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Vicente Fuenzalida, Gonzalo
  • Goicoa Mangado, Tomás
  • Fernández Rasines, Paloma
  • Ugarte Martínez, María Dolores
Crimes against women in India have been continuously increasing lately as reported by the National Crime Records Bureau. Gender-based violence has become a serious issue to such an extent that it has been catalogued as a high impact health problem by the World Health Organization. However, there is a lack of spatiotemporal analyses to reveal a complete picture of the geographical and temporal patterns of crimes against women. We focus on analysing how the geographical pattern of 'dowry deaths' changes over time in the districts of Uttar Pradesh during the period 2001–2014. The study of the geographical distribution of dowry death incidence and its evolution over time aims to identify specific regions that exhibit high risks and to hypothesize on potential risk factors. We also look into different spatial priors and their effects on final risk estimates. Various priors for the hyperparameters are also reviewed. The risk estimates seem to be robust in terms of the spatial prior and hyperprior choices and final results highlight several districts with extreme risks of dowry death incidence. Statistically significant associations are also found between dowry deaths, sex ratio and some forms of overall crime., This work has been supported by project MTM2017‐82553‐R (Agencia Estatal de Investigación–Fondos de Desarrollo Regional, European Union). It has also been partially funded by the Caixa Foundation (grant 1000010434), Caja Navarra Foundation and National University of Distance Learning, Pamplona, under agreement LCF/PR/PR15/51100007.




Software tools and statistical methods for downloading, processing, and analysing satellite images

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Pérez Goya, Unai
El principal objetivo de esta tesis es la introducción y desarrollo de métodos estadísticos en imágenes satelitales para mejorar el procesamiento, suavizado, predicción, e inferencia de los datos de teledetección. Este objetivo principal se puede dividir en los siguientes sub-objetivos. El primero contempla la adquisición, gestión, y automatización los procesos de descarga de datos de teledetección desde múltiples plataformas de manera estandarizada. El segundo es proporcionar una breve descripción de las principales herramientas geostadísticas utilizadas en teledetección, enfatizando la importancia de los métodos estocásticos espacio-temporales. El tercer sub-objetivo consiste en explorar algunas técnicas para detectar cambios de tendencia, analizando la evolución natural de algunos índices. El cuarto subobjetivo es el desarrollo de nuevos métodos para la predicción de datos perdidos y suavización de errores en imágenes satelitales utilizando la dependencia espacial y temporal. El objetivo final es el desarrollo de un nuevo paquete de R llamado ‘RGISTools’. Permite la descarga, pre-procesamiento, y gestión de imágenes satelitales de Landsat, MODIS, y Sentinel-2. También contiene los nuevos métodos de predicción de datos perdidos y suavización derivados de esta tesis., The main objective of this thesis is the introduction and development of statistical methods in satellite imagery to improve the processing, smoothing, prediction, and inference of remote sensing data. This objective can be split into the following sub-objectives. The first one is acquiring, managing, and automatizing processes to download remote sensing data from different platforms in a standardised way. The second one is to provide a brief review of the main geostatistics tools used in satellite imagery, emphasizing the importance of considering stochastic spatiotemporal methods. The third sub-objective consists in exploring some techniques to detect trend changes when analysing the natural evolution of certain indices. The four goal is to develop new methods for filling gaps and smoothing errors in satellite images using spatial and temporal dependence. As a final goal a new R package, called 'RGISTools', was created. It allows downloading, pre-processing, and managing Landsat, Modis, and Sentinel-2 satellite images. It also contains the new gap filling and smoothing methods derived in this thesis., Financial support of three institutions: a) the Spanish Ministry of Economy and Competitiveness (project MTM2017-82553-R AEI/FEDER grants,
MTM2014-51992-R, and MTM2011-22664), b) the Government of Navarre (projects PI015-2016 and PI043-2017), and c) 'La Caixa' Foundation (ID 1000010434), Caja
Navarra Foundation and UNED Pamplona, under agreement LCF/PR/PR15/51100007., Programa de Doctorado en Ciencias y Tecnologías Industriales (RD 99/2011), Industria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011)




On the performances of trend and change-point detection methods for remote sensing data

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Moradi, Mohammad Mehdi
  • Ugarte Martínez, María Dolores
Detecting change-points and trends are common tasks in the analysis of remote sensing data. Over the years, many different methods have been proposed for those purposes, including (modified) Mann-Kendall and Cox-Stuart tests for detecting trends; and Pettitt, Buishand range, Buishand U, standard normal homogeneity (Snh), Meanvar, structure change (Strucchange), breaks for additive season and trend (BFAST), and hierarchical divisive (E. divisive) for detecting change-points. In this paper, we describe a simulation study based on including different artificial, abrupt changes at different time-periods of image time series to assess the performances of such methods. The power of the test, type I error probability, and mean absolute error (MAE) were used as performance criteria, although MAE was only calculated for change-point detection methods. The study reveals that if the magnitude of change (or trend slope) is high, and/or the change does not occur in the first or last time-periods, the methods generally have a high power and a low MAE. However, in the presence of temporal autocorrelation, MAE raises, and the probability of introducing false positives increases noticeably. The modified versions of the Mann-Kendall method for autocorrelated data reduce/moderate its type I error probability, but this reduction comes with an important power diminution. In conclusion, taking a trade-off between the power of the test and type I error probability, we conclude that the original Mann-Kendall test is generally the preferable choice. Although Mann-Kendall is not able to identify the time-period of abrupt changes, it is more reliable than other methods when detecting the existence of such changes. Finally, we look for trend/change-points in land surface temperature (LST), day and night, via monthly MODIS images in Navarre, Spain, from January 2001 to December 2018., This work has been supported by Project MTM2017-82553-R (AEI/ FEDER, UE). It has also received funding from la Caixa Foundation (ID1000010434), Caja Navarra Foundation, and UNED Pamplona, under agreement LCF/PR/PR15/51100007.




Filling missing data and smoothing altered data in satellite imagery with a spatial functional procedure

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Militino, Ana F.
  • Ugarte Martínez, María Dolores
  • Montesino San Martín, Manuel
Outliers and missing data are commonly found in satellite imagery. These are usually caused by atmospheric or electronic failures, hampering the correct monitoring of remote-sensing data. To avoid distorted data, we propose a procedure called 'spatial functional prediction' (SFP). The SFP procedure consists of the following: (1) aggregating remote-sensing data for reducing the number of missing data and/or outliers; (2) additively decomposing the time series of images into a trend, a seasonal, and an error component; (3) defining the spatial functional data and predicting the trend component using an ordinary kriging; and (4) adding back the seasonal and error components to the predicted trend. The benefits of the SFP procedure are illustrated in the following scenarios: introducing random outliers, random missing data, mixtures of both, and artificial clouds in an extensive simulation study of composite images, and using daily images with real clouds. The following two derived variables are considered: land surface temperature (LST day) and normalized vegetation index (NDVI), which are obtained as remote-sensing data in a region in northern Spain during 2003–2016. The performance of SFP was checked using the root mean squared error (RMSE). A comparison with a procedure based on predicting with thin-plate splines (TpsP) is also made. We conclude that SFP is simpler and faster than TpsP, and provides smaller values of RMSE., This research was supported by Project MTM2017-82553-R (AEI/FEDER, UE), and by 'La Caixa' Foundation (ID 1000010434), Caja Navarra Foundation, and UNED Pamplona, under agreement LCF/PR/PR15/51100007.