Publicaciones

Found(s) 8 result(s)
Found(s) 1 page(s)

A compression-based method for detecting anomalies in textual data

Biblos-e Archivo. Repositorio Institucional de la UAM
  • de la Torre-Abaitua, Gonzalo
  • Lago Fernández, Luis Fernando
  • Arroyo, David
Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden, This research has received funding from the European Union’s Horizon 2020 Research
and Innovation Programme under grant agreement No. 872855 (TRESCA project), from the Comunidad
de Madrid (Spain) under the projects CYNAMON (P2018/TCS-4566) and S2017/BMD-3688,
co-financed with FSE and FEDER EU funds, by the Consejo Superior de Investigaciones Científicas
(CSIC) under the project LINKA20216 (“Advancing in cybersecurity technologies”, i-LINK+
program), and by Spanish project MINECO/FEDER TIN2017-84452-R
Proyecto: EC/H2020/872855




Fake news detection: When complex problems demand complex solutions

Digital.CSIC. Repositorio Institucional del CSIC
  • Oliva. Christian
  • Palacio Marín, Ignacio
  • Lago-Fernández, Luis F.
  • Arroyo Guardeño, David
Fake news detection is one of the most challenging problems in today's information and communication systems. In this article we address the challenge of detecting the generation and spreading of misleading information in the specific scenario of rumours propagation and clickbait. We realise that the construction of the dataset used to study this kind of problems dramatically affects the performance of the model and, thus, its selection. Hence, we conduct experiments with two datasets of different complexity. In experiment A, by using a simple dataset with rumour propagation data from Twitter, we demonstrate that good performance scores can be obtained without relying on the high computational cost of hyper-parameters tuning. In experiment B, an approach with fewer parameters and computational layers is not suitable to study clickbait with a larger dataset featuring more complex dynamics. Information deluge clearly demands the automation of the procedures for information treatment and the adequate combination of natural language processing and machine learning techniques. As the underlying problem is very complex, there is a tendency to think that the solution must be a complex model, i.e. a model with a large number of parameters and hyper-parameters. Our results confirm this idea, and underline the importance of identifying the most appropriate model assumptions based on the type of dataset available in order to select and configure the machine learning algorithm., No
Proyecto: EC/H2020/872855




Trustworthy humans and machines: Vulnerable trustors and the need for trustee competence, integrity, and benevolence in digital systems

Digital.CSIC. Repositorio Institucional del CSIC
  • Degli Esposti, Sara
  • Arroyo, David
In this chapter we argue that transparency is worthless in guaranteeing the trustworthiness of the trustees (those in charge of designing and managing digital systems) when trustors are vulnerable and depend on those same technologies and systems. Specifically here we focus on the trustworthiness of security engineers, those responsible for the dependability of systems and devices. We propose abandoning the rationalistic instrumental paradigm of trust to embrace a vision of trust as care in order to ensure the trustworthiness—that is, the competence, integrity, and benevolence—of operators of digitally mediated sociotechnical systems., This work was partially funded by the “TRESCA—Trustworthy, Reliable, and Engaging Scientific Communication Approaches” project, funded by the European Union’s Horizon 2020 Research and Innovation Program under grant agreement no. 872855, and by the project “CYNAMON— Cybersecurity, Network Analysis, and Monitoring for the Next Generation Internet,” sponsored by “Programas de Actividades de I+D entre grupos de investigación de la Comunidad de Madrid en tecnologías 2018” (P2018/TCS4566), cofinanced with FSE and FEDER EU funds., Peer reviewed
Proyecto: EC/H2020/872855




A compression-based method for detecting anomalies in textual data

Digital.CSIC. Repositorio Institucional del CSIC
  • Torre-Abaitua, Gonzalo de la
  • Lago-Fernández, Luis Fernando
  • Arroyo Guardeño, David
14 páginas; 11 tablas; 1 figura, Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden., This research has received funding from the European Union’s Horizon 2020 Research
and Innovation Programme under grant agreement No. 872855 (TRESCA project), from the Comunidad de Madrid (Spain) under the projects CYNAMON (P2018/TCS-4566) and S2017/BMD-3688,
co-financed with FSE and FEDER EU funds, by the Consejo Superior de Investigaciones Científicas (CSIC) under the project LINKA20216 (“Advancing in cybersecurity technologies”, i-LINK+
program), and by Spanish project MINECO/FEDER TIN2017-84452-R.




TRESCA 2021 Dataset: Survey experiments in 7 EU countries

Digital.CSIC. Repositorio Institucional del CSIC
  • Degli Esposti, Sara
The CSIC team worked in cooperation with the Spanish subsidiary of Dynata to create the electronic versions of the questionnaire available in report D3.1. Dynata Global Spain SL was appointed by CSIC to carry out the data collection in seven EU countries (France, Hungary, Germany, Italy, Netherlands, Poland, and Spain). During February 2021 until the 10th of March, the survey was available to Dynata’s proprietary panel members. Two versions of the questionnaire were used: Appendix One and Appendix Two. App. ONE was administered to people of all ages in France, Germany, Hungary, Italy, the Netherlands, Poland, and Spain. App. TWO was available to people age 16-25 in Germany, the Netherlands and Spain., This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 872855., Peer reviewed
Proyecto: EC/H2020/872855




Verdad, desinformación y verificación: contexto de estudio y contribución al debate, Truth, Disinformation and Verification: Context of Study and Contribution to Debate

Digital.CSIC. Repositorio Institucional del CSIC
  • Wagner, A.S.
  • Degli Esposti, Sara
Este artículo está sujeto a una licencia CC BY-NC 4.0, RESPONTRUST Uncertainty, Trust and Responsibility. Keys to Counteracting Di-sinformation, Infodemic and Conspiranoia during the COVID19 pandemic (SGL2104001, PTI Salud Global del CSIC), financiado por la Unión Europea “NextGeneration”/PRTR; INconRES Incertidumbre, confianza y responsabilidad. Claves ético-epistemológicas de las nuevas dinámicas sociales en la era digital (PID2020-117219GB-I00), financiado por MCIN/ AEI/10.13039/501100011033/; (c) ON TRUST-CM (H2019/HUM-5699), financiado por la Consejería de Educación e Investigación de la Comunidad de Madrid, Fondo Social Europeo.
H2020 project “TRESCA – Trustworthy, Reliable and Engaging Scientific Com-munication Approaches” (No. 872855); (b) “CYNAMON – Cybersecurity, Network Analysis and Monitoring for the Next Generation Internet”. Programas de Actividades de I+D entre grupos de investigación de la Comunidad de Madrid en tecnologías 2018 (P2018/TCS-4566; B.O.C.M. Núm. 304; 21 diciembre 2018)., Peer reviewed




El rol del análisis de género en la reducción de los sesgos algorítmicos

Digital.CSIC. Repositorio Institucional del CSIC
  • Degli Esposti, Sara
Este artículo está sujeto a una licencia CC BY-NC-SA 4.0, El análisis recogido en este artículo estudia el papel del análisis de género en combatir los sesgos algorítmicos derivados de la falta de inclusión y visibilidad de la mujer tanto en los macrodatos (big data) como en los algoritmos de inteligencia artificial (IA). Al fin de analizar esta problemática, se estima el número de mujeres matriculadas en áreas de estudios relacionados con la informática, la ingeniería y la ciencia de datos y se analizan las barreras a la interdisciplinariedad que pueden facilitar la difusión de sesgos algorítmicos y limitar el progreso de la economía digital., Peer reviewed
Proyecto: EC/H2020/872855




On the Design of a Misinformation Widget (MsW) Against Cloaked Science

Digital.CSIC. Repositorio Institucional del CSIC
  • Arroyo Guardeño, David
  • Degli Esposti, Sara
  • Gómez Espés, Alberto
  • Palmero Muñoz, Santiago
  • Pérez Miguel, Luis
Amongst all types of fabricated information travelling on open social networks, scientific disinformation, or cloaked science, is both insidious and challenging to be investigated. Here we present the design of the TRESCA misinformation widget (MsW), which is both a methodology and a toolbox for investigating disinformation operations leveraging scientific communications. In developing MsW we adopt a human-in-charge approach to AI: the automated tools included in MsW REST API are meant to support, not to substitute or undermine, users’ decision-making capacity. On the journey toward information verification, MsW AI toolbox helps users test both the veracity of claims and the reliability of sources. While the toolbox integrates open source intelligence solutions, MsW methodology fosters users’ critical thinking., This work was partially funded by European H2020 project TRESCA (Grant Agreement No 872855), national project XAI-DisInfodemics (grant PLEC2021-007681 funded by MCIN/AEI/10.13039/501100011033 and by European Union NextGeneration EU/PRTR), and regional project “CYNAMON - Cybersecurity, Network Analysis and Monitoring for the Next Generation Internet” (funded by “Programas de Actividades de I+D entre grupos de investigación de la Comunidad de Madrid en tecnologías 2018” P2018/TCS-4566; BOCM. No. 304; 21/12/2018)., Peer reviewed