TALENT-HIPSTER: HIGH PERFORMANCE SYSTEMS AND TECHNOLOGIES FOR E-HEALTH AND FISH FARMING
PID2020-116417RB-C41
•
Nombre agencia financiadora Agencia Estatal de Investigación
Acrónimo agencia financiadora AEI
Programa Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Subprograma Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Convocatoria Proyectos I+D
Año convocatoria 2020
Unidad de gestión Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020
Centro beneficiario UNIVERSIDAD POLITECNICA DE MADRID
Identificador persistente http://dx.doi.org/10.13039/501100011033
Publicaciones
Found(s) 8 result(s)
Found(s) 1 page(s)
Found(s) 1 page(s)
SLIMBRAIN Database: A Multimodal Image Database of In-vivo Human Brains for Tumor Detection.
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
- Martín-Pérez, Alberto
- Villa Romero, Manuel
- Rosa Olmeda, Gonzalo
- Sancho Aragón, Jaime
- Vázquez Valle, Guillermo
- Urbanos García, Gemma
- Martínez de Ternero Ruíz, Alejandro
- Chavarrias Lapastora, Miguel
- Jimenez-Roldan, Luis
- Perez-Nuñez, Angel
- Lagares, Alfonso
- Juárez Martínez, Eduardo
- Sanz Álvaro, César
<p></p>
<p>Project Description</p>
<p></p>
<p>Hyperspectral imaging and machine learning have been employed in the medical field for classifying highly infiltrative brain tumors. Although existing HSI databases of in-vivo human brains are available, they present two main deficiencies. Firstly, the amount of labeled data is scarce and secondly, 3D-tissue information is unavailable. To address both issues, we present the SLIMBRAIN database, a multimodal image database of in-vivo human brains which provides HS brain tissue data within the 400-1000 nm spectrum, as well as RGB, depth and multi-view images. Two HS cameras, two depth cameras and different RGB sensors were used to capture images and videos from 193 patients. All data in the SLIMBRAIN database can be used in a variety of ways, for example to train ML models with more than 1 million HS pixels available and labeled by neurosurgeons, to reconstruct 3D scenes or to visualize RGB brain images with different pathologies, offering unprecedented flexibility for both the medical and engineering communities.</p>
<p></p>
<p>--------------------------------</p>
<p> Data Description</p>
<p>--------------------------------</p>
<p></p>
<p>The SLIMBRAIN database contains anonymous hyperspectral, depth and RGB image data from in-vivo, and also ex-vivo, human brains from 193 patients.</p>
<p></p>
<p>SLIMBRAIN database. The available data are:</p>
<p></p>
<p>- CalibrationFiles: 5 .zip files to calibrate hyperspectral data for the different SLIMBRAIN prototypes and 1 .zip file containing the intrinsic and extrinsic parameters for some cameras.</p>
<p></p>
<p>- Datasets: 2 .zip files containing the patient's datasets for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- GroundTruthMaps: 2 .zip files containing the patient's ground-truths folders for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- PaperExperiments: 1 .zip files containing several files that store the patient IDs used for the results shown in the paper.</p>
<p></p>
<p>- preProcessedImages: Several .zip files containing the hyperspectral pre-processed cubes for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- RawFiles: 193 .zip files containing the raw files acquired in the operating room for each of the 193 patients. These files contains the raw images from different cameras, videos and depth images.</p>
<p></p>
<p>---------------------</p>
<p> Notes</p>
<p>---------------------</p>
<p></p>
<p>To access the SLIMBRAIN Database, you need to fill, accept and sign the Data Usage Agreement terms. Then, you need to send it to us, using the emails included at the end of the document. We will evaluate your application and, if you are accepted, you will receive a confirmation email with the necessary steps to access the data.</p>
<p></p>
<p>You can either find the Data Usage Agreement within this page or at https://slimbrain.citsem.upm.es. Then, you could access https://slimbrain.citsem.upm.es/search to filter the patients using the available online service provided by Research Center on Software Technologies and Multimedia Systems for Sustainability (CITSEM) and Fundación para la Investigación Biomédica del Hospital Universitario 12 de Octubre (FIBH12O).</p>
<p></p>
<p>You could also use https://slimbrain.citsem.upm.es/files to see the raw data online without the need of downloading it.</p>
<p></p>
<p>For further information, you can visit the official SLIMBRAIN database website at https://slimbrain.citsem.upm.es, where you can find Python software to manage the hyperspectral data provided.</p>
<p></p>
<p>---------------------</p>
<p> Files</p>
<p>---------------------</p>
<p></p>
<p>- CalibrationFiles:</p>
<p></p>
<p>These files store the calibration files necessary for the hyperspectral data and depth cameras.</p>
<p></p>
<p>Specifically, folders starting with a number indicate a hyperspectral calibration library with dark</p>
<p></p>
<p>and white references at different working distances and tilt angles:</p>
<p></p>
<p>- 1_Tripod_popoman: For the Ximea snapshot camera. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 2_Prototype_laser: For the Ximea snapshot camera. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 3_Protoype_lidar: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 4_Prototype_lidar: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the Osram lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 5_Prototype_Kinect: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the International Light lamp and ambient fluorescent lamps turned off. Obtained in the laboratory.</p>
<p></p>
<p>Furthermore, the depth, RGB and HS sensor calibration files, including intrinsic, extrinsic and distortion</p>
<p></p>
<p>parameters, are included as .json files in DepthCameraCalibrationFiles.</p>
<p></p>
<p>- Datasets:</p>
<p></p>
<p>These files stores each patient dataset with the spectral information of every labelled pixel. These are obtained from the coordinates of its corresponding ground-truth map and pre-processed cube, which have been labeled by the neurosurgeons using a labelling tool based on the Spectral Angle Map (SAM) metric. Patient datasets are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- GroundTruthMaps:</p>
<p></p>
<p>These files stores each patient ground-truth map labeled by the neurosurgeons. The labelling tool is based on the Spectral Angle Map (SAM) metric as already used in existing hyperspectral in-vivo human brain databases. Patient ground-truth maps are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- PaperExperiments.zip:</p>
<p></p>
<p>Contains 2 .txt files with the patient's IDs used for the experiments shown in the paper.</p>
<p></p>
<p>- preProcessedImages:</p>
<p></p>
<p>These files stores each patient hyperspectral pre-processed cube. These are obtained from the raw data included in the RawFiles folder and the described pre-processing chain applied to them. Patient pre-processed cubes are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- RawFiles:</p>
<p></p>
<p>These files stores the raw files obtained in each of the operations. It can include hyperspectral data, RGB data and depth information for each patient ID. All data is anonimized to keep the privacy of each human patient.</p>
<p>Project Description</p>
<p></p>
<p>Hyperspectral imaging and machine learning have been employed in the medical field for classifying highly infiltrative brain tumors. Although existing HSI databases of in-vivo human brains are available, they present two main deficiencies. Firstly, the amount of labeled data is scarce and secondly, 3D-tissue information is unavailable. To address both issues, we present the SLIMBRAIN database, a multimodal image database of in-vivo human brains which provides HS brain tissue data within the 400-1000 nm spectrum, as well as RGB, depth and multi-view images. Two HS cameras, two depth cameras and different RGB sensors were used to capture images and videos from 193 patients. All data in the SLIMBRAIN database can be used in a variety of ways, for example to train ML models with more than 1 million HS pixels available and labeled by neurosurgeons, to reconstruct 3D scenes or to visualize RGB brain images with different pathologies, offering unprecedented flexibility for both the medical and engineering communities.</p>
<p></p>
<p>--------------------------------</p>
<p> Data Description</p>
<p>--------------------------------</p>
<p></p>
<p>The SLIMBRAIN database contains anonymous hyperspectral, depth and RGB image data from in-vivo, and also ex-vivo, human brains from 193 patients.</p>
<p></p>
<p>SLIMBRAIN database. The available data are:</p>
<p></p>
<p>- CalibrationFiles: 5 .zip files to calibrate hyperspectral data for the different SLIMBRAIN prototypes and 1 .zip file containing the intrinsic and extrinsic parameters for some cameras.</p>
<p></p>
<p>- Datasets: 2 .zip files containing the patient's datasets for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- GroundTruthMaps: 2 .zip files containing the patient's ground-truths folders for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- PaperExperiments: 1 .zip files containing several files that store the patient IDs used for the results shown in the paper.</p>
<p></p>
<p>- preProcessedImages: Several .zip files containing the hyperspectral pre-processed cubes for the snapshot and linescan hyperspectral cameras.</p>
<p></p>
<p>- RawFiles: 193 .zip files containing the raw files acquired in the operating room for each of the 193 patients. These files contains the raw images from different cameras, videos and depth images.</p>
<p></p>
<p>---------------------</p>
<p> Notes</p>
<p>---------------------</p>
<p></p>
<p>To access the SLIMBRAIN Database, you need to fill, accept and sign the Data Usage Agreement terms. Then, you need to send it to us, using the emails included at the end of the document. We will evaluate your application and, if you are accepted, you will receive a confirmation email with the necessary steps to access the data.</p>
<p></p>
<p>You can either find the Data Usage Agreement within this page or at https://slimbrain.citsem.upm.es. Then, you could access https://slimbrain.citsem.upm.es/search to filter the patients using the available online service provided by Research Center on Software Technologies and Multimedia Systems for Sustainability (CITSEM) and Fundación para la Investigación Biomédica del Hospital Universitario 12 de Octubre (FIBH12O).</p>
<p></p>
<p>You could also use https://slimbrain.citsem.upm.es/files to see the raw data online without the need of downloading it.</p>
<p></p>
<p>For further information, you can visit the official SLIMBRAIN database website at https://slimbrain.citsem.upm.es, where you can find Python software to manage the hyperspectral data provided.</p>
<p></p>
<p>---------------------</p>
<p> Files</p>
<p>---------------------</p>
<p></p>
<p>- CalibrationFiles:</p>
<p></p>
<p>These files store the calibration files necessary for the hyperspectral data and depth cameras.</p>
<p></p>
<p>Specifically, folders starting with a number indicate a hyperspectral calibration library with dark</p>
<p></p>
<p>and white references at different working distances and tilt angles:</p>
<p></p>
<p>- 1_Tripod_popoman: For the Ximea snapshot camera. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 2_Prototype_laser: For the Ximea snapshot camera. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 3_Protoype_lidar: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the Dolan Jenner lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 4_Prototype_lidar: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the Osram lamp and ambient fluorescent lamps turned on. Obtained in the operating room when empty.</p>
<p></p>
<p>- 5_Prototype_Kinect: For the Ximea snapshot and Headwall linescan cameras. Illumination done with the International Light lamp and ambient fluorescent lamps turned off. Obtained in the laboratory.</p>
<p></p>
<p>Furthermore, the depth, RGB and HS sensor calibration files, including intrinsic, extrinsic and distortion</p>
<p></p>
<p>parameters, are included as .json files in DepthCameraCalibrationFiles.</p>
<p></p>
<p>- Datasets:</p>
<p></p>
<p>These files stores each patient dataset with the spectral information of every labelled pixel. These are obtained from the coordinates of its corresponding ground-truth map and pre-processed cube, which have been labeled by the neurosurgeons using a labelling tool based on the Spectral Angle Map (SAM) metric. Patient datasets are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- GroundTruthMaps:</p>
<p></p>
<p>These files stores each patient ground-truth map labeled by the neurosurgeons. The labelling tool is based on the Spectral Angle Map (SAM) metric as already used in existing hyperspectral in-vivo human brain databases. Patient ground-truth maps are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- PaperExperiments.zip:</p>
<p></p>
<p>Contains 2 .txt files with the patient's IDs used for the experiments shown in the paper.</p>
<p></p>
<p>- preProcessedImages:</p>
<p></p>
<p>These files stores each patient hyperspectral pre-processed cube. These are obtained from the raw data included in the RawFiles folder and the described pre-processing chain applied to them. Patient pre-processed cubes are available for the Ximea snapshot and Headwall linescan hyperspectral cameras.</p>
<p></p>
<p>- RawFiles:</p>
<p></p>
<p>These files stores the raw files obtained in each of the operations. It can include hyperspectral data, RGB data and depth information for each patient ID. All data is anonimized to keep the privacy of each human patient.</p>
Bedtime Monitoring for Fall Detection and Prevention in Older Adults
RUA. Repositorio Institucional de la Universidad de Alicante
- Fernández-Bermejo Ruiz, Jesús
- Dorado Chaparro, Javier
- Santofimia Romero, Maria José
- Villanueva Molina, Félix Jesús
- Toro García, Xavier del
- Bolaños Peño, Cristina
- Llumiguano Solano, Henry
- Colantonio, Sara
- Flórez-Revuelta, Francisco
- López, Juan Carlos
Life expectancy has increased, so the number of people in need of intensive care and attention is also growing. Falls are a major problem for older adult health, mainly because of the consequences they entail. Falls are indeed the second leading cause of unintentional death in the world. The impact on privacy, the cost, low performance, or the need to wear uncomfortable devices are the main causes for the lack of widespread solutions for fall detection and prevention. This work present a solution focused on bedtime that addresses all these causes. Bed exit is one of the most critical moments, especially when the person suffers from a cognitive impairment or has mobility problems. For this reason, this work proposes a system that monitors the position in bed in order to identify risk situations as soon as possible. This system is also combined with an automatic fall detection system. Both systems work together, in real time, offering a comprehensive solution to automatic fall detection and prevention, which is low cost and guarantees user privacy. The proposed system was experimentally validated with young adults. Results show that falls can be detected, in real time, with an accuracy of 93.51%, sensitivity of 92.04% and specificity of 95.45%. Furthermore, risk situations, such as transiting from lying on the bed to sitting on the bed side, are recognized with a 96.60% accuracy, and those where the user exits the bed are recognized with a 100% accuracy., This research was funded by H2020 European Union program under grant agreement No. 857159 (SHAPES project) and by MCIN/AEI/10.13039/501100011033 grant TALENT-BELIEF (PID2020-116417RB-C44) and by GoodBrother COST action 19121.
GPU-Based Real-Time Depth Generation for Immersive Video Applications
Archivo Digital UPM
- Sancho Aragón, Jaime
En los últimos años, las técnicas de realidad aumentada y vídeo inmersivo han surgido como solución para mejorar la visualización por ordenador. Estas técnicas pretenden resolver el problema construyendo representaciones tridimensionales precisas de la realidad por las que se pueda navegar libremente y a las que se pueda añadir información basada en ordenador. Sin embargo, los requisitos de las tecnologías de realidad aumentada y vídeo inmersivo superan el nivel tecnológico actual. Requieren capturar la posición de los objetos en lugar de sólo su intensidad de color, lo que suele expresarse en mapas de profundidad, es decir, imágenes de distancia a la cámara. Hoy en día, aunque posible, la generación de mapas de profundidad presenta una limitación clave: la calidad de los mapas de profundidad está supeditada a su tiempo de generación, lo que aleja la generación de mapas de profundidad de alta calidad de las aplicaciones en tiempo real.
Esta tesis doctoral pretende explorar las técnicas actuales de generación de profundidad, compararlas y proponer nuevos métodos para obtener mapas de profundidad de alta calidad y en tiempo real para aplicaciones de realidad aumentada y vídeo inmersivo. Para ello, se centra en tres de las técnicas más relevantes para obtener información de profundidad en el estado del arte actual (State of the Art (SotA)): (i) la estimación de profundidad multivista RGB, (ii) la captura de profundidad mediante cámaras de tiempo de vuelo (Time of Flight (ToF)), y (iii) la captura de campos de luz (Light Field (LF)) mediante cámaras plenópticas 2.0.
La estimación de profundidad multivista RGB se basa en el uso de varios sensores de cámara situados en diferentes posiciones. Aunque se ha explorado intensamente, no existe ningún algoritmo capaz de ofrecer una alta calidad a altas frecuencias de cuadro. Por ejemplo, los algoritmos de estimación de profundidad de alta calidad en el estado del arte actual presentan tiempos de procesamiento de varios órdenes de magnitud por encima del tiempo real.
La captura de profundidad mediante cámaras ToF emplea un sensor activo y otro pasivo para medir el tiempo de vuelo de una señal. Este proceso puede realizarse a 30 cuadros por segundo (Frames Per Second (FPS)); sin embargo, los mapas de profundidad generados presentan bajas resoluciones espaciales y artefactos característicos. Además, es necesario alinear la captura de profundidad con un sensor RGB. Estos factores provocan una pérdida de calidad significativa, en comparación con los algoritmos de estimación de profundidad mencionados anteriormente.
La captura de LF mediante cámaras plenópticas 2.0 también permite una generación de profundidad a 30 FPS. Estas cámaras no tienen el problema de alinear la captura RGB con la profundidad, pero presentan un problema fundamental: la información de profundidad real sólo puede generarse en los bordes de color, haciendo necesarios algoritmos de extensión de profundidad para generar un mapa de profundidad completo. La calidad de la profundidad resultante dependiente, por tanto, de estos algoritmos, que pueden ralentizar la velocidad de fotogramas. Además, el nivel de ruido de profundidad entre fotogramas en las cámaras probadas es elevado, en comparación con las otras técnicas.
A partir de estos hallazgos, este doctorado explora dos líneas de investigación para mejorar el estado del arte actual bien en la calidad de la profundidad capturada, o en la tasa de cuadros con que se captura. Para ello, utiliza aceleradores gráficos (Graphics Processing Units (GPUs)) para la aceleración de la estimación de profundidad multivista basada en cámaras RGB, y para llevar a cabo el refinamiento de los mapas de profundidad capturados desde cámaras To F.
En el caso de la multivista RGB, el principal problema es el tiempo de procesamiento necesario para generar un mapa de profundidad para los algoritmos de estimación de profundidad de alta calidad: Depth Estimation Reference Software (DERS) y Immersive Video Depth Estimation (IVDE) necesitan del orden de decenas a miles de segundos para generar un cuadro de profundidad en una estación de trabajo de gama alta. Por esta razón, esta tesis doctoral introduce Graph cuts Reference depth estimation in GPU (GoRG), un algoritmo de estimación de profundidad acelerado en GPU basado en una novedosa aceleración del método de optimización graph cuts. Los resultados muestran que GoRG obtiene resultados de calidad de profundidad 0.12 dB Immersive Video – Peak Signal to Noise Ratio (IV-PSNR) peores que el mejor algoritmo de estimación de profundidad de alta calidad probado, con la ventaja de unos tiempos de procesamiento dos órdenes de magnitud inferiores. Aunque significativamente más cerca del tiempo real, el tiempo de procesamiento conseguido por GoRG se sitúa entre 1 y 10 s por cuadro para un ordenador de gama alta y GPU, lo que sigue siendo insuficiente para aplicaciones en tiempo real. Siguiendo esta línea, esta tesis doctoral también investiga el uso de cámaras hiperespectrales (Hyperspectral (HS)) en sistemas multivista para generar información de profundidad. Estas cámaras se diferencian de las cámaras RGB habituales en el número de bandas espectrales que capturan, que puede oscilar entre decenas y cientos, lo que permite caracterizar espectralmente los elementos de la escena capturada. En este contexto, se presenta HS–GoRG, una extensión de GoRG para arrays hiperespectrales multivista. Los resultados muestran que HS–GoRG puede producir resultados con un error Root Mean Squared Error (RMSE) de 6,68 cm (11.3 % del rango total de profundidad probado), aunque principalmente localizado alrededor del error de 2-4 cm (3.3 % - 6.6 % del rango total de profundidad probado) en 2.1 s por cuadr, de media. Este resultado muestra la dificultad de utilizar el algoritmo desarrollado en entornos de tiempo real. En cuanto al refinamiento de profundidad ToF, esta tesis doctoral propone dos nuevos algoritmos de refinamiento de profundidad para cámaras ToF: GoRG–Prior, y Kinect Refinement Tool (KiRT). GoRG–Prior es un método de refinamiento de profundidad basado en cortes de gráficos que mejora la captura en bruto del Intel L515 LiDAR en 0.37 dB IV-PSNR a una tasa de cuadros de 10 FPS, de media, en comparación con los 0.18 dB IV-PSNR a una tasa de cuadro de 250 FPS conseguidos por el segundo algoritmo de mejor calidad probado. El elevado tiempo de procesamiento de GoRG–Prior motivó el desarrollo de KiRT, que reduce la complejidad del algoritmo sustituyendo graph cuts por un algoritmo basado en fronteras. KiRT es un algoritmo de refinamiento de profundidad acelerado en GPU para configuraciones de múltiples cámaras que alcanza frecuencias de cuadro cercanas a 55 FPS, al tiempo que obtiene resultados de calidad ligeramente mejores para la cámara Azure Kinect DK que el segundo algoritmo de mejor calidad probado: 3.07 dB IV-PSNR frente a 2.97 dB IV-PSNR. La principal diferencia subjetiva apreciada entre ambos es la capacidad de KiRT para generar bordes de profundidad abruptos y un mejor rendimiento en regiones de gran profundidad vacías.
Estas aportaciones de la tesis doctoral se han probado en dos casos reales enmarcados en los proyectos de investigación: clasificacióN intraopEratoria de tuMores cErebraleS mediante modelos InmerSivos 3D (NEMESIS-3D-CM) y Holographic Vision for Immersive Tele-Robotic OperatioN (HoviTron). NEMESIS- 3D-CM es un proyecto que persigue mejorar las herramientas de visualización médica para operaciones de resección de tumores cerebrales. Los resultados muestran la viabilidad de utilizar el Intel L515 LiDAR más GoRG–Prior en un escenario real para generar una realidad virtual en tiempo real que puede ayudar a los neurocirujanos durante las operaciones de resección de tumores cerebrales. HoviTron persigue generar una representación en tiempo real de alta calidad de escenas en aplicaciones de operaciones telerrobóticas. Estas escenas necesitan ser presentadas en un LF–Head Mounted Display (HMD), lo que requiere información de profundidad generada y procesada en tiempo real. En este proyecto, este trabajo de doctorado se centra en el refinamiento de profundidad de 4 u 8 cámaras Microsoft Kinect Azure DK ToF en tiempo real empleando KiRT. Los resultados muestran que para la configuración de 4 cámaras se alcanzan 20 FPS, mientras que para la configuración de 8 cámaras, 12 FPS, con mejores resultados subjetivos que el algoritmo de segunda mejor calidad probado.
En conclusión, esta tesis doctoral demuestra que el análisis de generación de profundidad realizado y las técnicas propuestas contribuyen al desarrollo de sistemas de realidad aumentada interactivos en tiempo real. Aunque la generación de profundidad en tiempo real sigue siendo un problema, se ha demostrado que dispositivos como las cámaras ToF y los algoritmos de refinamiento de profundidad son buenos candidatos para seguir investigando en el futuro.
ABSTRACT
During the last years, augmented reality and immersive video techniques emerged as a solution to improve computer-based visualization. These techniques aim to solve the problem by constructing accurate 3D representations of reality that can be freely navigated and where computer-based information can be added. However, augmented reality and immersive video technology requirements are over the current technology level. They require capturing objects’ position rather than only their color intensity, which is usually expressed in depth maps, i.e., distance-to-the-camera images. Nowadays, although possible, depth map generation presents a key limitation: the quality of the depth maps is contingent on their generation time, keeping away the high-quality depth map generation from real-time applications.
This Ph.D. dissertation seeks to explore the current depth generation techniques, compare them, and propose new methods to obtain high-quality and real-time depth maps for augmented reality and immersive video applications. To do so, it focuses on three of the most relevant techniques to obtain depth information in the current State of the Art (SotA): (i) RGB multiview depth estimation, (ii) Time of Flight (ToF) depth sensing, and (iii) Light Field (LF) capture through plenoptic 2.0 cameras.
The RGB multiview depth estimation is based on the use of several camera sensors located in different positions. Although intensively explored, there is no algorithm able to yield high quality at high frame rates. For example, high-quality depth estimation algorithms feature processing times several orders of magnitude far from real-time.
ToF depth sensing employs an active and passive sensor to measure a signal time-of-flight. This process can be performed at 30 Frames Per Second (FPS); however, the depth maps generated feature low spatial resolutions and characteristic artifacts. In addition, there is a need for aligning the depth capture with an RGB sensor. These factors cause a significant quality loss, compared to the aforementioned depth estimation algorithms.
The LF capture through plenoptic 2.0 cameras also allows a depth generation at 30 FPS. These cameras do not have the problem of aligning the RGB capture with the depth, but they present a fundamental problem: the actual depth information can only be generated in color borders, making necessary depth extension algorithms to generate a complete depth map. The resulting depth quality is hence contingent on these algorithms, which may slow down the frame rate. In addition, the level of inter-frame depth noise in the plenoptic cameras tested is high, compared to the other techniques.
From these findings, this Ph.D. explores two research lines to improve the current SotA either on depth quality or in frame rates leveraging on Graphics Processing Unit (GPU) accelerators: the acceleration of multiview depth estimation based on passive cameras, and the accelerated depth refinement of ToF depth maps.
For RGB multiview arrays, the main problem is the processing time needed to generate a depth map for the SotA high-quality depth estimation algorithms. For example, Depth Estimation Reference Software (DERS) needs in the order of hundreds to thousands of seconds to generate a depth frame in a high-end workstation. Similarly, Immersive Video Depth Estimation (IVDE) needs between 50 and 100 seconds to generate a depth frame in the same platform. For this reason, this Ph.D. dissertation introduces Graph cuts Reference depth estimation in GPU (GoRG), a GPU-accelerated depth estimation algorithm based on a novel GPU acceleration of the optimization method graph cuts. The GoRG depth quality results are 0.12 dB Immersive Video – Peak Signal to Noise Ratio (IV-PSNR) worse than the best high-quality depth estimation algorithm tested with the advantage of processing times two orders of magnitude below. Although significantly closer to real-time, the processing time achieved by GoRG is between 1 and 10 s per frame for a high-end computer and GPU, which is still insufficient for real-time applications. Following this line, this Ph.D. dissertation also investigates the use of Hyperspectral (HS) cameras in multiview arrays to generate depth information. These cameras differ from usual RGB cameras in the number of spectrum bands they capture, which can range from tens to hundreds, allowing to spectrally characterize the elements in the captured scene. In this context, HS–GoRG is presented, an extension of GoRG for HS-multiview arrays. Results show that HS–GoRG can produce results with an RMSE error of 6.68 cm (11.3 % of the total depth range tested), although mainly located around the 2-4 cm error (3.3 % - 6.6 % of the total depth range tested) in 2.1 s per frame, on average. This result shows the difficulty to use the developed algorithm in real-time environments.
Regarding ToF depth refinement, this Ph.D. dissertation proposes two new depth refinement algorithms for ToF cameras: GoRG–Prior, and Kinect Refinement Tool (KiRT). GoRG–Prior is a depth refinement method based on graph cuts that slightly improves the Intel L515 LiDAR raw capture in 0.37 dB IV-PSNR at a frame rate of 10 FPS, on average. The high processing time in GoRG–Prior motivated the development of KiRT, which reduces the algorithm complexity by replacing graph cuts with a frontier-based algorithm. KiRT is a GPU-accelerated depth refinement algorithm for multi-ToF camera setups that achieves frame rates near 55 FPS while improving Azure Kinect DK depth maps in 3.07 dB IV-PSNR. It is worth noting that KiRT generates abrupt depth borders and correctly refines depth map regions with large artifacts.
These Ph.D. dissertation contributions have been tested on two real case studies framed in the research projects: clasificacióN intraopEratoria de tuMores cErebraleS mediante modelos InmerSivos 3D (NEMESIS-3D-CM) and Holographic Vision for Immersive Tele-Robotic OperatioN (HoviTron). NEMESIS- 3D-CM is a project that pursues to improve the medical visualization tools for brain tumor resection operations. Results show the feasibility of using Intel L515 LiDAR plus GoRG–Prior in a real scenario to generate a real-time Augmented Reality (AR) that can help neurosurgeons during brain tumor resection operations. HoviTron pursues to generate a high-quality real-time representation of scenes in telerobotic operation applications. These scenes need to be presented in LF–Head Mounted Displays (HMDs), which requires depth information generated and processed in real-time. Related to this project, this Ph.D. work focuses on the depth refinement of 4 or 8 Microsoft Kinect Azure DK ToF cameras in real-time employing KiRT. Results show that for the 4-cameras setup, 20 FPS are achieved, whilst for the 8-cameras setup, 12 FPS.
In conclusion, this Ph.D. dissertation proves that the depth generation analysis performed and the techniques proposed contribute to the development of real-time interactive AR systems. Although depth generation in real-time is still a problem, devices such as ToF cameras plus depth refinement algorithms are proven to be good candidates to further investigate in the future.
Esta tesis doctoral pretende explorar las técnicas actuales de generación de profundidad, compararlas y proponer nuevos métodos para obtener mapas de profundidad de alta calidad y en tiempo real para aplicaciones de realidad aumentada y vídeo inmersivo. Para ello, se centra en tres de las técnicas más relevantes para obtener información de profundidad en el estado del arte actual (State of the Art (SotA)): (i) la estimación de profundidad multivista RGB, (ii) la captura de profundidad mediante cámaras de tiempo de vuelo (Time of Flight (ToF)), y (iii) la captura de campos de luz (Light Field (LF)) mediante cámaras plenópticas 2.0.
La estimación de profundidad multivista RGB se basa en el uso de varios sensores de cámara situados en diferentes posiciones. Aunque se ha explorado intensamente, no existe ningún algoritmo capaz de ofrecer una alta calidad a altas frecuencias de cuadro. Por ejemplo, los algoritmos de estimación de profundidad de alta calidad en el estado del arte actual presentan tiempos de procesamiento de varios órdenes de magnitud por encima del tiempo real.
La captura de profundidad mediante cámaras ToF emplea un sensor activo y otro pasivo para medir el tiempo de vuelo de una señal. Este proceso puede realizarse a 30 cuadros por segundo (Frames Per Second (FPS)); sin embargo, los mapas de profundidad generados presentan bajas resoluciones espaciales y artefactos característicos. Además, es necesario alinear la captura de profundidad con un sensor RGB. Estos factores provocan una pérdida de calidad significativa, en comparación con los algoritmos de estimación de profundidad mencionados anteriormente.
La captura de LF mediante cámaras plenópticas 2.0 también permite una generación de profundidad a 30 FPS. Estas cámaras no tienen el problema de alinear la captura RGB con la profundidad, pero presentan un problema fundamental: la información de profundidad real sólo puede generarse en los bordes de color, haciendo necesarios algoritmos de extensión de profundidad para generar un mapa de profundidad completo. La calidad de la profundidad resultante dependiente, por tanto, de estos algoritmos, que pueden ralentizar la velocidad de fotogramas. Además, el nivel de ruido de profundidad entre fotogramas en las cámaras probadas es elevado, en comparación con las otras técnicas.
A partir de estos hallazgos, este doctorado explora dos líneas de investigación para mejorar el estado del arte actual bien en la calidad de la profundidad capturada, o en la tasa de cuadros con que se captura. Para ello, utiliza aceleradores gráficos (Graphics Processing Units (GPUs)) para la aceleración de la estimación de profundidad multivista basada en cámaras RGB, y para llevar a cabo el refinamiento de los mapas de profundidad capturados desde cámaras To F.
En el caso de la multivista RGB, el principal problema es el tiempo de procesamiento necesario para generar un mapa de profundidad para los algoritmos de estimación de profundidad de alta calidad: Depth Estimation Reference Software (DERS) y Immersive Video Depth Estimation (IVDE) necesitan del orden de decenas a miles de segundos para generar un cuadro de profundidad en una estación de trabajo de gama alta. Por esta razón, esta tesis doctoral introduce Graph cuts Reference depth estimation in GPU (GoRG), un algoritmo de estimación de profundidad acelerado en GPU basado en una novedosa aceleración del método de optimización graph cuts. Los resultados muestran que GoRG obtiene resultados de calidad de profundidad 0.12 dB Immersive Video – Peak Signal to Noise Ratio (IV-PSNR) peores que el mejor algoritmo de estimación de profundidad de alta calidad probado, con la ventaja de unos tiempos de procesamiento dos órdenes de magnitud inferiores. Aunque significativamente más cerca del tiempo real, el tiempo de procesamiento conseguido por GoRG se sitúa entre 1 y 10 s por cuadro para un ordenador de gama alta y GPU, lo que sigue siendo insuficiente para aplicaciones en tiempo real. Siguiendo esta línea, esta tesis doctoral también investiga el uso de cámaras hiperespectrales (Hyperspectral (HS)) en sistemas multivista para generar información de profundidad. Estas cámaras se diferencian de las cámaras RGB habituales en el número de bandas espectrales que capturan, que puede oscilar entre decenas y cientos, lo que permite caracterizar espectralmente los elementos de la escena capturada. En este contexto, se presenta HS–GoRG, una extensión de GoRG para arrays hiperespectrales multivista. Los resultados muestran que HS–GoRG puede producir resultados con un error Root Mean Squared Error (RMSE) de 6,68 cm (11.3 % del rango total de profundidad probado), aunque principalmente localizado alrededor del error de 2-4 cm (3.3 % - 6.6 % del rango total de profundidad probado) en 2.1 s por cuadr, de media. Este resultado muestra la dificultad de utilizar el algoritmo desarrollado en entornos de tiempo real. En cuanto al refinamiento de profundidad ToF, esta tesis doctoral propone dos nuevos algoritmos de refinamiento de profundidad para cámaras ToF: GoRG–Prior, y Kinect Refinement Tool (KiRT). GoRG–Prior es un método de refinamiento de profundidad basado en cortes de gráficos que mejora la captura en bruto del Intel L515 LiDAR en 0.37 dB IV-PSNR a una tasa de cuadros de 10 FPS, de media, en comparación con los 0.18 dB IV-PSNR a una tasa de cuadro de 250 FPS conseguidos por el segundo algoritmo de mejor calidad probado. El elevado tiempo de procesamiento de GoRG–Prior motivó el desarrollo de KiRT, que reduce la complejidad del algoritmo sustituyendo graph cuts por un algoritmo basado en fronteras. KiRT es un algoritmo de refinamiento de profundidad acelerado en GPU para configuraciones de múltiples cámaras que alcanza frecuencias de cuadro cercanas a 55 FPS, al tiempo que obtiene resultados de calidad ligeramente mejores para la cámara Azure Kinect DK que el segundo algoritmo de mejor calidad probado: 3.07 dB IV-PSNR frente a 2.97 dB IV-PSNR. La principal diferencia subjetiva apreciada entre ambos es la capacidad de KiRT para generar bordes de profundidad abruptos y un mejor rendimiento en regiones de gran profundidad vacías.
Estas aportaciones de la tesis doctoral se han probado en dos casos reales enmarcados en los proyectos de investigación: clasificacióN intraopEratoria de tuMores cErebraleS mediante modelos InmerSivos 3D (NEMESIS-3D-CM) y Holographic Vision for Immersive Tele-Robotic OperatioN (HoviTron). NEMESIS- 3D-CM es un proyecto que persigue mejorar las herramientas de visualización médica para operaciones de resección de tumores cerebrales. Los resultados muestran la viabilidad de utilizar el Intel L515 LiDAR más GoRG–Prior en un escenario real para generar una realidad virtual en tiempo real que puede ayudar a los neurocirujanos durante las operaciones de resección de tumores cerebrales. HoviTron persigue generar una representación en tiempo real de alta calidad de escenas en aplicaciones de operaciones telerrobóticas. Estas escenas necesitan ser presentadas en un LF–Head Mounted Display (HMD), lo que requiere información de profundidad generada y procesada en tiempo real. En este proyecto, este trabajo de doctorado se centra en el refinamiento de profundidad de 4 u 8 cámaras Microsoft Kinect Azure DK ToF en tiempo real empleando KiRT. Los resultados muestran que para la configuración de 4 cámaras se alcanzan 20 FPS, mientras que para la configuración de 8 cámaras, 12 FPS, con mejores resultados subjetivos que el algoritmo de segunda mejor calidad probado.
En conclusión, esta tesis doctoral demuestra que el análisis de generación de profundidad realizado y las técnicas propuestas contribuyen al desarrollo de sistemas de realidad aumentada interactivos en tiempo real. Aunque la generación de profundidad en tiempo real sigue siendo un problema, se ha demostrado que dispositivos como las cámaras ToF y los algoritmos de refinamiento de profundidad son buenos candidatos para seguir investigando en el futuro.
ABSTRACT
During the last years, augmented reality and immersive video techniques emerged as a solution to improve computer-based visualization. These techniques aim to solve the problem by constructing accurate 3D representations of reality that can be freely navigated and where computer-based information can be added. However, augmented reality and immersive video technology requirements are over the current technology level. They require capturing objects’ position rather than only their color intensity, which is usually expressed in depth maps, i.e., distance-to-the-camera images. Nowadays, although possible, depth map generation presents a key limitation: the quality of the depth maps is contingent on their generation time, keeping away the high-quality depth map generation from real-time applications.
This Ph.D. dissertation seeks to explore the current depth generation techniques, compare them, and propose new methods to obtain high-quality and real-time depth maps for augmented reality and immersive video applications. To do so, it focuses on three of the most relevant techniques to obtain depth information in the current State of the Art (SotA): (i) RGB multiview depth estimation, (ii) Time of Flight (ToF) depth sensing, and (iii) Light Field (LF) capture through plenoptic 2.0 cameras.
The RGB multiview depth estimation is based on the use of several camera sensors located in different positions. Although intensively explored, there is no algorithm able to yield high quality at high frame rates. For example, high-quality depth estimation algorithms feature processing times several orders of magnitude far from real-time.
ToF depth sensing employs an active and passive sensor to measure a signal time-of-flight. This process can be performed at 30 Frames Per Second (FPS); however, the depth maps generated feature low spatial resolutions and characteristic artifacts. In addition, there is a need for aligning the depth capture with an RGB sensor. These factors cause a significant quality loss, compared to the aforementioned depth estimation algorithms.
The LF capture through plenoptic 2.0 cameras also allows a depth generation at 30 FPS. These cameras do not have the problem of aligning the RGB capture with the depth, but they present a fundamental problem: the actual depth information can only be generated in color borders, making necessary depth extension algorithms to generate a complete depth map. The resulting depth quality is hence contingent on these algorithms, which may slow down the frame rate. In addition, the level of inter-frame depth noise in the plenoptic cameras tested is high, compared to the other techniques.
From these findings, this Ph.D. explores two research lines to improve the current SotA either on depth quality or in frame rates leveraging on Graphics Processing Unit (GPU) accelerators: the acceleration of multiview depth estimation based on passive cameras, and the accelerated depth refinement of ToF depth maps.
For RGB multiview arrays, the main problem is the processing time needed to generate a depth map for the SotA high-quality depth estimation algorithms. For example, Depth Estimation Reference Software (DERS) needs in the order of hundreds to thousands of seconds to generate a depth frame in a high-end workstation. Similarly, Immersive Video Depth Estimation (IVDE) needs between 50 and 100 seconds to generate a depth frame in the same platform. For this reason, this Ph.D. dissertation introduces Graph cuts Reference depth estimation in GPU (GoRG), a GPU-accelerated depth estimation algorithm based on a novel GPU acceleration of the optimization method graph cuts. The GoRG depth quality results are 0.12 dB Immersive Video – Peak Signal to Noise Ratio (IV-PSNR) worse than the best high-quality depth estimation algorithm tested with the advantage of processing times two orders of magnitude below. Although significantly closer to real-time, the processing time achieved by GoRG is between 1 and 10 s per frame for a high-end computer and GPU, which is still insufficient for real-time applications. Following this line, this Ph.D. dissertation also investigates the use of Hyperspectral (HS) cameras in multiview arrays to generate depth information. These cameras differ from usual RGB cameras in the number of spectrum bands they capture, which can range from tens to hundreds, allowing to spectrally characterize the elements in the captured scene. In this context, HS–GoRG is presented, an extension of GoRG for HS-multiview arrays. Results show that HS–GoRG can produce results with an RMSE error of 6.68 cm (11.3 % of the total depth range tested), although mainly located around the 2-4 cm error (3.3 % - 6.6 % of the total depth range tested) in 2.1 s per frame, on average. This result shows the difficulty to use the developed algorithm in real-time environments.
Regarding ToF depth refinement, this Ph.D. dissertation proposes two new depth refinement algorithms for ToF cameras: GoRG–Prior, and Kinect Refinement Tool (KiRT). GoRG–Prior is a depth refinement method based on graph cuts that slightly improves the Intel L515 LiDAR raw capture in 0.37 dB IV-PSNR at a frame rate of 10 FPS, on average. The high processing time in GoRG–Prior motivated the development of KiRT, which reduces the algorithm complexity by replacing graph cuts with a frontier-based algorithm. KiRT is a GPU-accelerated depth refinement algorithm for multi-ToF camera setups that achieves frame rates near 55 FPS while improving Azure Kinect DK depth maps in 3.07 dB IV-PSNR. It is worth noting that KiRT generates abrupt depth borders and correctly refines depth map regions with large artifacts.
These Ph.D. dissertation contributions have been tested on two real case studies framed in the research projects: clasificacióN intraopEratoria de tuMores cErebraleS mediante modelos InmerSivos 3D (NEMESIS-3D-CM) and Holographic Vision for Immersive Tele-Robotic OperatioN (HoviTron). NEMESIS- 3D-CM is a project that pursues to improve the medical visualization tools for brain tumor resection operations. Results show the feasibility of using Intel L515 LiDAR plus GoRG–Prior in a real scenario to generate a real-time Augmented Reality (AR) that can help neurosurgeons during brain tumor resection operations. HoviTron pursues to generate a high-quality real-time representation of scenes in telerobotic operation applications. These scenes need to be presented in LF–Head Mounted Displays (HMDs), which requires depth information generated and processed in real-time. Related to this project, this Ph.D. work focuses on the depth refinement of 4 or 8 Microsoft Kinect Azure DK ToF cameras in real-time employing KiRT. Results show that for the 4-cameras setup, 20 FPS are achieved, whilst for the 8-cameras setup, 12 FPS.
In conclusion, this Ph.D. dissertation proves that the depth generation analysis performed and the techniques proposed contribute to the development of real-time interactive AR systems. Although depth generation in real-time is still a problem, devices such as ToF cameras plus depth refinement algorithms are proven to be good candidates to further investigate in the future.
SLIMBRAIN: Augmented Reality Real-Time Acquisition and Processing System For Hyperspectral Classification Mapping with Depth Information for In-Vivo Surgical Procedures
Archivo Digital UPM
- Sancho Aragón, Jaime
- Villa Romero, Manuel
- Chavarrías Lapastora, Miguel
- Juárez Martínez, Eduardo
- Lagares Gómez Abascal, Alfonso
- Sanz Alvaro, César
Over the last two decades, augmented reality (AR) has led to the rapid development of new interfaces in various fields of social and technological application domains. One such domain is medicine, and to a higher extent surgery, where these visualization techniques help to improve the effectiveness of preoperative and intraoperative procedures. Following this trend, this paper presents SLIMBRAIN, a real-time acquisition and processing AR system suitable to classify and display brain tumor tissue from hyperspectral (HS) information. This system captures and processes HS images at 14 frames per second (FPS) during the course of a tumor resection operation to detect and delimit cancer tissue at the same time the neurosurgeon operates. The result is represented in an AR visualization where the classification results are overlapped with the RGB point cloud captured by a LiDAR camera. This representation allows natural navigation of the scene at the same time it is captured and processed, improving the visualization and hence effectiveness of the HS technology to delimit tumors. The whole system has been verified in real brain tumor resection operations.
GPU-based parallelisation of a versatile video coding adaptive loop filter in resource-constrained heterogeneous embedded platform
Archivo Digital UPM
- Saha, Anup
- Roma, Nuno
- Chavarrías Lapastora, Migue
- Dias, Tiago
- Pescador del Oso, Fernando
- Aranda López, Víctor
This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilised to maximise the degree of parallelism, making the programme capable of exploiting the GPU capabilities. The proposed approach enables to accelerate the ALF computation by an average of two times when compared to an already fully optimised version of the software decoder implementation over an embedded platform. Finally, this work presents an analysis of energy consumption, showing that the proposed methodology has a negligible impact on this key parameter.
Proyecto: MINECO//PID2020-116417RB-C41
Implementation of a Grey Prediction System Based on Fuzzy Inference for Transmission Power Control in IoT Edge Sensor Nodes
Archivo Digital UPM
- Moreno, Guillermo
- Mujica Rojas, Gabriel Noe
- Portilla Berrueco, Jorge
- Lee, Jin-Shyan
The rapid growth of the Internet of Things (IoT) has expanded the research and implementation of wireless sensor networks (WSNs) in various application domains. However, the challenges associated with resource-constrained sensor nodes and the need for ultralow power consumption pose significant problems. One fundamental strategy to address these challenges is transmission power control (TPC), which adjusts the transmission power of nodes to optimize network performance and lifetime. While traditional methods have focused on static scenarios, this work presents a novel approach for mobile WSNs based on a grey-fuzzy-logic TPC (Grey-FTPC). The proposed system integrates grey prediction techniques with fuzzy inference to dynamically adapt transmission power levels. Unlike previous simulations, this work focuses on real implementations, considering practical aspects of WSN deployments and the characteristics of embedded sensor platforms. The objectives of this work are twofold: 1) to implement a Grey-FTPC on an IoT embedded hardware platform, ensuring compatibility with IEEE 802.15.4 networks and 2) to propose a runtime adaptive link recovery mechanism to enhance the robustness of the adaptive TPC in mobile and unstable contexts. Additionally, a multihop mobile Grey-FTPC strategy is introduced, enabling collaborative transmission power adaptation among sensor nodes. Experimental tests demonstrate the high-prediction accuracy of the proposal even in multihop scenarios, confirming the system's scalability. Results also show that the proposed system outperforms other strategies in terms or energy consumption, achieving up to 43% of gains depending on the scenario.
Proyecto: MINECO//PID2020-116417RB-C41
The Application of Evolutionary, Swarm, and Iterative-Based Task-Offloading Optimization for Battery Life Extension in Wireless Sensor Networks
Archivo Digital UPM
- González de Dueñas, Paula
- Mujica Rojas, Gabriel Noe
- Portilla Berrueco, Jorge
The proliferation of Internet-of-Things (IoT) devices has exponentially increased data generation, placing substantial computational demands on resource-constrained sensor nodes at the extreme edge. Task offloading presents a promising solution to tackle these challenges, enabling energy-aware and resource-efficient computing in wireless sensor networks (WSNs). Despite its recognized benefits, the exploration of task offloading in extreme edge environments remains limited in current research. This study aims to bridge the existing research gap by investigating the application of computational offloading in WSNs to reduce energy consumption. Our key contribution lies in the introduction of optimization algorithms explicitly designed for WSNs. Our proposal, focusing on bandwidth allocation, employs metaheuristic and iterative algorithms adapted to WSN characteristics, enhancing energy efficiency and network lifespan. Through extensive experimental analysis, our findings highlight the significant impact of task offloading on improving energy efficiency and overall system performance in extreme-edge IoT environments. Notably, we demonstrate a remarkable up to 135% reduction in network consumption when employing task offloading, compared to a network without offloading. Furthermore, our distinctive multiobjective approach, utilizing particle swarm algorithms, distinguishes itself from other proposed algorithms. This implementation effectively balances individual node consumption, resulting in an extended network lifetime while successfully achieving both specified objectives.
Proyecto: MINECO//PID2020-116417RB-C41
A Deep Learning Approach for Fear Recognition on the Edge Based on Two-Dimensional Feature Maps
Archivo Digital UPM
- Sun, Junjiao
- Portilla Berrueco, Jorge
- Otero Marnotes, Andres
Applying affective computing techniques to recognize fear and combining them with portable signal monitors makes it possible to create real-time detection systems that could act as bodyguards when users are in danger. With this aim, this paper presents a fear recognition method based on physiological signals obtained from wearable devices. The procedure involves creating two-dimensional feature maps from the raw signals, using data augmentation and feature selection algorithms, followed by deep learning-based classification models, taking inspiration from those used in image processing. This proposal has been validated with two different datasets, achieving, in WEMAC, WESAD 3-classes, and WESAD 2-classes, F1-score results of 78.13%, 88.07%, and 99.60%, respectively, and 79.90%, 89.12%, and 99.60% in accuracy. Furthermore, the paper demonstrates the feasibility of implementing the proposed method on the Coral Edge TPU device, prepared to make inferences on the edge.
Proyecto: MINECO//PID2020-116417RB-C41