MODELOS GRAFICOS PROBABILISTICOS PARA CLASIFICADORES MULTIDIMENSIONALES CON APLICACIONES EN VISION POR COMPUTADOR

TIN2008-06815-C02-02

Nombre agencia financiadora Ministerio de Ciencia e Innovación
Acrónimo agencia financiadora MICINN
Programa Programa Nacional de Investigación Fundamental
Subprograma Investigación fundamental no-orientada
Convocatoria Investigación fundamental no-orientada
Año convocatoria 2008
Unidad de gestión Subdirección General de Proyectos de Investigación
Centro beneficiario UNIVERSIDAD COMPLUTENSE DE MADRID
Centro realización UNIVERSIDAD COMPLUTENSE DE MADRID (UCM)
Identificador persistente http://dx.doi.org/10.13039/501100004837

MODELOS GRAFICOS PROBABILISTICOS PARA CLASIFICADORES MULTIDIMENSIONALES CON APLICACIONES EN VISION POR COMPUTADOR

TIN2008-06815-C02-02

Nombre agencia financiadora Ministerio de Ciencia e Innovación
Acrónimo agencia financiadora MICINN
Programa Programa Nacional de Investigación Fundamental
Subprograma Investigación fundamental no-orientada
Convocatoria Investigación fundamental no-orientada
Año convocatoria 2008
Unidad de gestión Subdirección General de Proyectos de Investigación
Centro beneficiario UNIVERSIDAD POLITECNICA DE MADRID
Centro realización FACULTAD DE INFORMÁTICA
Identificador persistente http://dx.doi.org/10.13039/501100004837

MODELOS GRAFICOS PROBABILISTICOS PARA CLASIFICADORES MULTIDIMENSIONALES CON APLICACIONES EN VISION POR COMPUTADOR

TIN2008-06815-C02-02

Nombre agencia financiadora Ministerio de Ciencia e Innovación
Acrónimo agencia financiadora MICINN
Programa Programa Nacional de Investigación Fundamental
Subprograma Investigación fundamental no-orientada
Convocatoria Investigación fundamental no-orientada
Año convocatoria 2008
Unidad de gestión Subdirección General de Proyectos de Investigación
Centro beneficiario UNIVERSIDAD REY JUAN CARLOS (URJC)
Centro realización UNIVERSIDAD REY JUAN CARLOS (URJC)
Identificador persistente http://dx.doi.org/10.13039/501100004837

Publicaciones

Resultados totales (Incluyendo duplicados): 17
Encontrada(s) 1 página(s)

Mateda-2.0: estimation of distribution algorithms in MATLAB

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Larrañaga, Pedro
  • Santana, Roberto
  • Bielza, Concha
  • Lozano, José Antonio
  • Echegoyen Arruti, Carlos
  • Mendiburu, Alexander
  • Armañanzas, Rubén
  • Shakya, Siddartha
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorporate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs., This work has been partially supported by the Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01, TIN2008-06815-C02-02, TIN2007-62626 and Consolider Ingenio 2010 - CSD2007-00018 projects (Spanish Ministry of Science and Innovation), the CajalBlueBrain project, and the COMBIOMED network in computational biomedicine (Carlos III Health Institute).




Class-Conditional Probabilistic Principal Component Analysis: application to gender recognition

Archivo Digital UPM
  • Bekios, Juan
  • Buenaposada Biencinto, José Miguel
  • Baumela Molina, Luis
Este trabajo presenta una solución al problema del reconocimiento del género de un rostro humano a partir de una imagen. Adoptamos una aproximación que utiliza la cara completa a través de la textura de la cara normalizada y redimensionada como entrada a un clasificador Näive Bayes. Presentamos la técnica de Análisis de Componentes Principales Probabilístico Condicionado-a-la-Clase (CC-PPCA) para reducir la dimensionalidad de los vectores de características para la clasificación y asegurar la asunción de independencia para el clasificador. Esta nueva aproximación tiene la deseable propiedad de presentar un modelo paramétrico sencillo para las marginales. Además, este modelo puede estimarse con muy pocos datos. En los experimentos que hemos desarrollados mostramos que CC-PPCA obtiene un 90% de acierto en la clasificación, resultado muy similar al mejor presentado en la literatura---ABSTRACT---This paper presents a solution to the problem of recognizing the gender of a human face from an image. We adopt a holistic approach by using the cropped and normalized texture of the face as input to a Naïve Bayes classifier. First it is introduced the Class-Conditional Probabilistic Principal Component Analysis (CC-PPCA) technique to reduce the dimensionality of the classification attribute vector and enforce the independence assumption of the classifier. This new approach has the desirable property of a simple parametric model for the marginals. Moreover this model can be estimated with very few data. In the experiments conducted we show that using CCPPCA we get 90% classification accuracy, which is similar result to the best in the literature. The proposed method is very simple to train and implement.




Optimizing brain networks topologies using multi-objective evolutionary computation

Archivo Digital UPM
  • Santana, Roberto
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
The analysis of brain network topological features has served to better understand these networks and reveal particular characteristics of their functional behavior. The distribution of brain network motifs is particularly useful for detecting and describing differences between brain networks and random and computationally optimized artificial networks. In this paper we use a multi-objective evolutionary optimization approach to generate optimized artificial networks that have a number of topological features resembling brain networks. The Pareto set approximation of the optimized networks is used to extract network descriptors that are compared to brain and random network descriptors. To analyze the networks, the clustering coefficient, the average path length, the modularity and the betweenness centrality are computed. We argue that the topological complexity of a brain network can be estimated using the number of evaluations needed by an optimization algorithm to output artificial networks of similar complexity. For the analyzed network examples, our results indicate that while original brain networks have a reduced structural motif number and a high functional motif number, they are not optimal with respect to these two topological features. We also investigate the correlation between the structural and functional motif numbers, the average path length and the clustering coefficient in random, optimized and brain networks.




Classifying evolving data streams with partially labeled data

Archivo Digital UPM
  • Borchani, Hanen
  • Larrañaga Múgica, Pedro María
  • Bielza Lozoya, María Concepción
Recently, several approaches have been proposed to deal with the increasingly challenging task of mining concept drifting data streams. However, most are based on supervised classification algorithms assuming that true labels are immediately and entirely available in the data streams. Unfortunately, such an assumption is often violated in real-world applications given that it is expensive or because it takes a long time to obtain all true labels. To deal with this problem, we propose in this paper a new semi-supervised approach for handling concept-drifting data streams containing both labeled and unlabeled instances. First, contrary to existing approaches, we monitor three possible kinds of drift: feature, conditional or dual drift. Drift detection is based on a hypothesis test comparing Kullback-Leibler divergence between old and recent data, whose distribution under the null hypothesis of coming from the same distribution is approximated via a bootstrap method. Then, if any drift occurs, anew classifier is learned from the recent data using the EM algorithm; otherwise, the current classifier is left unchanged. Our approach is so general that it can be applied to different classification models. Experimental studies, using the naive Bayes classifier and logistic regression, on both synthetic and real-world data sets demonstrate that our approach performs well.




Multi-dimensional classification with Bayesian networks

Archivo Digital UPM
  • Bielza Lozoya, María Concepción
  • Li, G.
  • Larrañaga Múgica, Pedro María
Multi-dimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multi-dimensional Bayesian network classifiers (MBCs). This probabilistic graphical model organizes class and feature variables as three different subgraphs: class subgraph, feature subgraph, and bridge (from class to features) subgraph. Under the standard0–1 loss function, the most probable explanation (MPE) must be computed, for which we provide theoretical results in both general MBCs and in MBCs decomposable into maximal connected components. Moreover, when computing the MPE, the vector of class values is covered by following a special ordering (gray code). Under other loss functions defined in accordance with a decomposable structure, we derive theoretical results on how to minimize the expected loss. Besides these inference issues, the paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches. The cardinality of the search space is also given. New performance evaluation metrics adapted from the single-class setting are introduced. Experimental results with three benchmark data sets are encouraging, and they outperform state-of-the-art algorithms for multi-label classification.




Optimal row and column ordering to improve table interpretation using estimation of distribution algorithms

Archivo Digital UPM
  • Bengoetxea, Endika
  • Larrañaga Múgica, Pedro María
  • Bielza Lozoya, María Concepción
  • Fernández del Pozo de Salamanca, Juan Antonio
A common information representation task in research as well as educational and statistical practice is to comprehensively and intuitively express data in two-dimensional tables. Examples include tables in scientific papers, as well as reports and the popular press.
Data is often simple enough for users to reorder. In many other cases though, there are complex data patterns that make finding the best re-arrangement of rows and columns for optimum readability a tough problem.
We propose that row and column ordering should be regarded as a combinatorial optimization problem and solved using evolutionary computation techniques. The use of genetic algorithms has already been proposed in the literature. This paper proposes for the first time the use of estimation of distribution algorithms for table ordering. We also propose alternative ways of representing the problem in order to reduce its dimensionality. By learning a selective naive Bayes classifier, we can find out how to jointly combine the parameters of these algorithms to get good table orderings. Experimental examples in this paper are on 2D tables.




Learning factorizations in estimation of distribution algorithms using affinity propagation

Archivo Digital UPM
  • Santana, Roberto
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
Estimation of distribution algorithms (EDAs) that use marginal product model factorization shave been widely applied to a broad range of mainly binary optimization problems. In this paper, we introduce the affinity propagation EDA (AffEDA) which learns a marginal product model by clustering a matrix of mutual information learned from the data using a very efficient message-passing algorithm known as affinity propagation. The introduced algorithm is tested on a set of binary and nonbinary decomposable functions and using a hard combinatorial class of problem known as theHP protein model. The results show that the algorithm is a very efficient alternative to other EDAs that use marginal product model factorizations such as the extended compact genetic algorithm (ECGA) and improves the quality of the results achieved by ECGA when the cardinality of the variables is increased.




Mateda-2.0:a MATLAB package for the implementation and analysis of estimation of distribution algorithms

Archivo Digital UPM
  • Santana, Roberto
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
  • Echegoyen Urruti, Carlos
  • Mendiburu Alberro, Alexander
  • Armañanzas Arnedillo, Ruben
  • Shakya, Siddartha
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorpórate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs.




Multidimensional statistical analysis of the parameterization of a genetic algorithm for the optimal ordering of tables

Archivo Digital UPM
  • Bielza Lozoya, María Concepción
  • Fernández del Pozo de Salamanca, Juan Antonio
  • Larrañaga Múgica, Pedro María
  • Bengoetxea, Endika
The optimal table row and column ordering can reveal useful patterns to improve reading and interpretation. Recently, genetic algorithms using standard crossover and mutation operators have been proposed to tackle this problem. In this paper, we carry out an experimental study that adds to this genetic algorithm crossover and mutation operators specially designed to deal with permutations and includes other parameters (initialization, replacement policy, mutation and crossover rates and stopping criteria) not examined in previous works. A proper analysis of the results must take into account all the parameters simultaneously, since the wrong conclusions can be drawn by studying each separately from the others. This is why we propose a framework for a multidimensional analysis of the results. This includes multiple hypothesis testing and a regression tree that builds a parsimonious and predictive model of the suitable configurations of the parameters.




Predicting citation count of Bioinformatics papers within four years of publication

Archivo Digital UPM
  • Ibáñez Martín, Alfonso
  • Larrañaga Múgica, Pedro María
  • Bielza Lozoya, María Concepción
Motivation: Nowadays, publishers of scientific journals face the tough task of selecting high-quality articles that will attract as many readers as possible from a pool of articles. This is due to the growth of scientific output and literature. The possibility of a journal having a tool capable of predicting the citation count of an article within the first few years after publication would pave the way for new assessment systems.

Results: This article presents a new approach based on buildin several prediction models for the Bioinformatics journal. These models predict the citation count of an article within 4 years after publication (global models). To build these models, tokens found in the abstracts of Bioinformatics papers have been used as predictive features, along with other features like the journal sections and 2-week post-publication periods. To improve the accuracy of the global models, specific models have been built for each Bioinformatics journal section (Data and Text Mining, Databases and Ontologies, Gene Expression, Genetics and Population Analysis, Genome Analysis, Phylogenetics, Sequence Analysis, Structural Bioinformatics and Systems Biology). In these new models, the average success rate for predictions using the naive Bayes and logistic regression supervised classification methods was 89.4% and 91.5%, respectively, within the nine sections and for 4-year time horizon.

Availability: Supplementary material on this experimental survey is available at http://www.dia.fi.upm.es/∼concha/bioinformatics.html




Selection of human embryos for transfer by Bayesian classifiers

Archivo Digital UPM
  • Morales Vega, Dinora A.
  • Bengoetxea Castro, Endika
  • Larrañaga Múgica, Pedro María
In this work we approach by Bayesian classifiers the selection of human embryos from images. This problem consists of choosing the embryos to be transferred in human-assisted reproduction treatments, which Bayesian classifiers address as a supervised classification problem. Different Bayesian classifiers capable of taking into account diverse dependencies between variables of this problem are tested in order to analyse their performance and validity for building a potential decision support system. The analysis by receiver operating characteristic (ROC) proves that the Bayesian classifiers presented in this paper are an appropriated and robust approach for this aim. From the Bayesian classifiers tested, the tree augmented naïve Bayes, k-dependence Bayesian and naïve Bayes classifiers showed to perform almost as well as the semi naïve Bayes and selective naïve Bayes classifiers.




Mining concept-drifting data streams containing labeled and unlabeled instances

Archivo Digital UPM
  • Borchani, Hanen
  • Larrañaga Múgica, Pedro María
  • Bielza Lozoya, María Concepción
Recently, mining data streams has attracted significant attention and has been considered as a challenging task in supervised classification. Most of the existing methods dealing with this problem assume the availability of entirely labeled data streams. Unfortunately, such assumption is often violated in real-world applications given that obtaining labels is a time-consuming and expensive task, while a large amount of unlabeled instances are readily available. In this paper, we propose a new approach for handling concept-drifting data streams containing labelled and unlabeled instances. First, we use KL divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classifier is learned using the EM algorithm; otherwise, the current classifier is kept unchanged. Our approach is general so that it can be applied with different classification models. Experiments performed with naive Bayes and logistic regression, on two benchmark datasets, show the good performance of our approach using only limited amounts of labeled instances.




Using probabilistic dependencies improves the search of conductance-based compartmental neuron models

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
Conductance-based compartmental neuron models are traditionally used to investigate the electrophysiological properties of neurons. These models require a number of parameters to be adjusted to biological experimental data and this question can be posed as an optimization problem. In this paper we investigate the behavior of different estimation of distribution algorithms (EDAs) for this problem. We focus on studying the influence that the interactions between the neuron model conductances have in the complexity of the optimization problem. We support evidence that the use of these interactions during the optimization process can improve the EDA behavior.




Bivariate empirical and n-variate Archimedean copulas in estimation of distribution algorithms

Bivariate empirical and n-variate Archimedean copulas in estimation of distribution algorithms-->
Archivo Digital UPM
  • Cuesta-Infante, Alfredo
  • Santana Hermida, Roberto
  • Hidalgo Peréz, José Ignacio
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
This paper investigates the use of empirical and Archimedean copulas as probabilistic models of continuous estimation of distribution algorithms (EDAs). A method for learning and sampling empirical bivariate copulas to be used in the context of n-dimensional EDAs is first introduced. Then, by using Archimedean copulas instead of empirical makes possible to construct n-dimensional copulas with the same purpose. Both copula-based EDAs are compared to other known continuous EDAs on a set of 24 functions and different number of variables. Experimental results show that the proposed copula-based EDAs achieve a better behaviour than previous approaches in a 20% of the benchmark functions.




Synergies between network-based representation and probabilistic graphical models for classification, inference and optimization problems in neuroscience

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
Neural systems network-based representations are useful tools to analyze numerous phenomena in neuroscience. Probabilistic graphical models (PGMs) give a concise and still rich representation of complex systems from different domains, including neural systems. In this paper we analyze the characteristics of a bidirectional relationship between networks-based representations and PGMs. We show the way in which this relationship can be exploited introducing a number of methods for the solution of classification, inference and optimization problems. To illustrate the applicability of the introduced methods, a number of problems from the field of neuroscience, in which ongoing research is conducted, are used.




Research topics in discrete estimation of distribution algorithms based on factorizations

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
In this paper, we identify a number of topics relevant for the improvement and development of discrete estimation of distribution algorithms. Focusing on the role of probability distributions and factorizations in estimation of distribution algorithms, we present a survey of current challenges where further research must provide answers that extend the potential and applicability of these algorithms. In each case we state the research topic and elaborate on the reasons that make it relevant for estimation of distribution algorithms. In some cases current work or possible alternatives for the solution of the problem are discussed.




Mining probabilistic models learned by EDAs in the optimization of multi-objective problems

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Bielza Lozoya, María Concepción
  • Lozano Ruiz, José Antonio
  • Larrañaga Múgica, Pedro María
One of the uses of the probabilistic models learned by estimation of distribution algorithms is to reveal previous unknown information about the problem structure. In this paper we investigate the mapping between the problem structure and the dependencies captured in the probabilistic models learned by EDAs for a set of multi-objective satisfiability problems. We present and discuss the application of different data mining and visualization techniques for processing and visualizing relevant information from the structure of the learned probabilistic models. We show that also in the case of multi-objective optimization problems, some features of the original problem structure can be translated to the probabilistic models and unveiled by using algorithms that mine the model structures.