MODELOS GRAFICOS PROBABILISTICOS EN PROBLEMAS DE CLASIFICACION SUPERVISADA MULTIDIMENSIONAL. APLICACIONES EN BIOINFORMATICA

TIN2008-06815-C02-01

Nombre agencia financiadora Ministerio de Ciencia e Innovación
Acrónimo agencia financiadora MICINN
Programa Programa Nacional de Investigación Fundamental
Subprograma Investigación fundamental no-orientada
Convocatoria Investigación fundamental no-orientada
Año convocatoria 2008
Unidad de gestión Subdirección General de Proyectos de Investigación
Centro beneficiario UNIVERSIDAD DEL PAIS VASCO EUSKAL HERRIKO UNIBERTSITATEA
Centro realización FACULTAD DE INFORMÁTICA
Identificador persistente http://dx.doi.org/10.13039/501100004837

Publicaciones

Resultados totales (Incluyendo duplicados): 12
Encontrada(s) 1 página(s)

Mateda-2.0: estimation of distribution algorithms in MATLAB

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Larrañaga, Pedro
  • Santana, Roberto
  • Bielza, Concha
  • Lozano, José Antonio
  • Echegoyen Arruti, Carlos
  • Mendiburu, Alexander
  • Armañanzas, Rubén
  • Shakya, Siddartha
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorporate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs., This work has been partially supported by the Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01, TIN2008-06815-C02-02, TIN2007-62626 and Consolider Ingenio 2010 - CSD2007-00018 projects (Spanish Ministry of Science and Innovation), the CajalBlueBrain project, and the COMBIOMED network in computational biomedicine (Carlos III Health Institute).




Analyzing the k most probable solutions in EDAs based on bayesian networks

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Echegoyen Arruti, Carlos
  • Mendiburu, Alexander
  • Santana, Roberto
  • Lozano, José Antonio
Acceso cerrado a este documento. No se encuentra disponible para la consulta pública. Depositado en Academica-e para cumplir con los requisitos de evaluación y acreditación académica del autor/a (sexenios, acreditaciones, etc.)., Estimation of distribution algorithms (EDAs) have been successfully applied to a wide variety of problems but, for themost complex approaches, there is no clear understanding of the way these algorithms complete the search. For that reason, in this work we exploit the probabilistic models that EDAs based on Bayesian networks are able to learn in order to provide new information about their behavior. Particularly, we analyze the k solutions with the highest probability in the distributions estimated during the search. In order to study the relationship between the probabilistic model and the fitness function, we focus on calculating, for the k most probable solutions (MPSs), the probability values, the function values and the correlation between both sets of values at each step of the algorithm. Furthermore, the objective functions of the k MPSs are contrasted with the k best individuals in the population. We complete the analysis by calculating the position of the optimum in the k MPSs during the search and the genotypic diversity of these solutions. We carry out the analysis by optimizing functions of different natures such as Trap5, two variants of Ising spin glass and Max-SAT. The results not only show information about the relationship between the probabilistic model and the fitness function, but also allow us to observe characteristics of the search space, the quality of the setup of the parameters and even distinguish between successful and unsuccessful runs., This work has been partially supported by the Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01 and
Consolider Ingenio 2010 - CSD2007-00018 projects (Spanish Ministry of Science and Innovation) and COMBIOMED network in computational biomedicine (Carlos
III Health Institute). Carlos Echegoyen has a grant from UPV-EHU.




Toward understanding EDAs based on bayesian networks through a quantitative analysis

Academica-e. Repositorio Institucional de la Universidad Pública de Navarra
  • Echegoyen Arruti, Carlos
  • Mendiburu, Alexander
  • Santana, Roberto
  • Lozano, José Antonio
The successful application of estimation of distribution algorithms (EDAs) to solve different kinds of problems has reinforced their candidature as promising black-box optimization tools. However, their internal behavior is still not completely understood and therefore it is necessary to work in this direction in order to advance their development. This paper presents a methodology of analysis which provides new information about the behavior of EDAs by quantitatively analyzing the probabilistic models learned during the search. We particularly focus on calculating the probabilities of the optimal solutions, the most probable solution given by the model and the best individual of the population at each step of the algorithm. We carry out the analysis by optimizing functions of different nature such as Trap5, two variants of Ising spin glass and Max-SAT. By using different structures in the probabilistic models, we also analyze the impact of the structural model accuracy in the quantitative behavior of EDAs. In addition, the objective function values of our analyzed key solutions are contrasted with their probability values in order to study the connection between function and probabilistic models. The results not only show information about the internal behavior of EDAs, but also about the quality of the optimization process and setup of the parameters, the relationship between the probabilistic model and the fitness function, and even about the problem itself. Furthermore, the results allow us to discover common patterns of behavior in EDAs and propose new ideas in the development of this type of algorithms., This work has been partially supported by the Saiotek and Research Groups 2007-2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01 and Consolider Ingenio 2010 - CSD2007-00018 projects (Spanish Ministry of Science and Innovation) and COMBIOMED network in computational biomedicine (Carlos III Health Institute). Carlos Echegoyen holds a grant from UPV-EHU. The work of Roberto Santana has been partially supported by the TIN2010-20900-C04-04
and Caja Blue Brain project (Spanish Ministry of Science and Innovation).




Optimal row and column ordering to improve table interpretation using estimation of distribution algorithms

Archivo Digital UPM
  • Bengoetxea, Endika
  • Larrañaga Múgica, Pedro María
  • Bielza Lozoya, María Concepción
  • Fernández del Pozo de Salamanca, Juan Antonio
A common information representation task in research as well as educational and statistical practice is to comprehensively and intuitively express data in two-dimensional tables. Examples include tables in scientific papers, as well as reports and the popular press.
Data is often simple enough for users to reorder. In many other cases though, there are complex data patterns that make finding the best re-arrangement of rows and columns for optimum readability a tough problem.
We propose that row and column ordering should be regarded as a combinatorial optimization problem and solved using evolutionary computation techniques. The use of genetic algorithms has already been proposed in the literature. This paper proposes for the first time the use of estimation of distribution algorithms for table ordering. We also propose alternative ways of representing the problem in order to reduce its dimensionality. By learning a selective naive Bayes classifier, we can find out how to jointly combine the parameters of these algorithms to get good table orderings. Experimental examples in this paper are on 2D tables.




Learning factorizations in estimation of distribution algorithms using affinity propagation

Archivo Digital UPM
  • Santana, Roberto
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
Estimation of distribution algorithms (EDAs) that use marginal product model factorization shave been widely applied to a broad range of mainly binary optimization problems. In this paper, we introduce the affinity propagation EDA (AffEDA) which learns a marginal product model by clustering a matrix of mutual information learned from the data using a very efficient message-passing algorithm known as affinity propagation. The introduced algorithm is tested on a set of binary and nonbinary decomposable functions and using a hard combinatorial class of problem known as theHP protein model. The results show that the algorithm is a very efficient alternative to other EDAs that use marginal product model factorizations such as the extended compact genetic algorithm (ECGA) and improves the quality of the results achieved by ECGA when the cardinality of the variables is increased.




Mateda-2.0:a MATLAB package for the implementation and analysis of estimation of distribution algorithms

Archivo Digital UPM
  • Santana, Roberto
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
  • Echegoyen Urruti, Carlos
  • Mendiburu Alberro, Alexander
  • Armañanzas Arnedillo, Ruben
  • Shakya, Siddartha
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorpórate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs.




Selection of human embryos for transfer by Bayesian classifiers

Archivo Digital UPM
  • Morales Vega, Dinora A.
  • Bengoetxea Castro, Endika
  • Larrañaga Múgica, Pedro María
In this work we approach by Bayesian classifiers the selection of human embryos from images. This problem consists of choosing the embryos to be transferred in human-assisted reproduction treatments, which Bayesian classifiers address as a supervised classification problem. Different Bayesian classifiers capable of taking into account diverse dependencies between variables of this problem are tested in order to analyse their performance and validity for building a potential decision support system. The analysis by receiver operating characteristic (ROC) proves that the Bayesian classifiers presented in this paper are an appropriated and robust approach for this aim. From the Bayesian classifiers tested, the tree augmented naïve Bayes, k-dependence Bayesian and naïve Bayes classifiers showed to perform almost as well as the semi naïve Bayes and selective naïve Bayes classifiers.




EDA-PSO: a hybrid paradigm combining Estimation of Distribution Algorithms and particle swarm optimization

Archivo Digital UPM
  • Bengoetxea Castro, Endika
  • Larrañaga Múgica, Pedro María
Estimation of Distribution Algorithms (EDAs) is an evolutionary computation optimization paradigm that relies the evolution of each generation on calculating a probabilistic graphical model able to reflect dependencies among variables out of the selected individuals of the population. This showed to be able to improve results with GAs for complex problems.

This paper presents a new hybrid approach combining EDAs and particle swarm optimization, with the aim to take advantage of EDAs capability to learn from the dependencies between variables while profiting particle swarm’s optimization ability to keep a sense of ”direction” towards the most promising areas of the search space. Experimental results show the validity of this approach with widely known combinatorial optimization problems.




Gaussian-Stacking multiclassifiers for human embryo selection

Archivo Digital UPM
  • Morales Vega, Dinora A.
  • Bengoetxea Castro, Endika
  • Larrañaga Múgica, Pedro María
Infertility is currently considered an important social problem that has been subject to special interest by medical doctors and biologists. Due to ethical reasons, different legislative restrictions apply in every country on human assisted reproduction techniques such as in-vitro fertilization (IVF). An essential problem in human assisted reproduction is the selection of suitable embryos to transfer in a patient, for which the application of artificial intelligence as well as data mining techniques can be helpful as decision-support systems. In this chapter we introduce a new multi-classification system using Gaussian networks to combine the outputs (probability distributions) of standard machine learning classification algorithms. Our method proposes to consider these outputs as inputs for a superior-level and to apply a stacking scheme to provide a meta-level classification result. We provide a proof of the validity of the approach by employing this multi-classification technique to a complex real medical problem: The selection of the most promising embryo-batch for human in-vitro fertilization treatments.




Research topics in discrete estimation of distribution algorithms based on factorizations

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Larrañaga Múgica, Pedro María
  • Lozano Alonso, José Antonio
In this paper, we identify a number of topics relevant for the improvement and development of discrete estimation of distribution algorithms. Focusing on the role of probability distributions and factorizations in estimation of distribution algorithms, we present a survey of current challenges where further research must provide answers that extend the potential and applicability of these algorithms. In each case we state the research topic and elaborate on the reasons that make it relevant for estimation of distribution algorithms. In some cases current work or possible alternatives for the solution of the problem are discussed.




Guest editorial: special issue on evolutionary algorithms based on probabilistic models

Archivo Digital UPM
  • Lozano Alonso, José Antonio
  • Zhang, Qingfu
  • Larrañaga Múgica, Pedro María
Evolutionary algorithms based on probabilistic models (EAPMs), have been recognized as a new computing paradigm in evolutionary computation. There is no traditional crossover or mutation in EAPMs. Instead, they explicitly extract global statistical information from their previous search and build a probability distribution model of promising solutions, based on the extracted information. New solutions are then sampled from the model thus built to replace old solutions. Instances of EAPMs include Population-Based Incremental Learning, the Univariate Marginal Distribution Algorithm (UMDA), Mutual Information Maximization for Input Clustering, the Factorized Distribution Algorithm, the Bayesian Optimization Algorithm, the Learnable Evolution Model and Estimation of Bayesian Networks Algorithms, to name a few. EAPMs have been successfully applied for solving many optimization and search problems.




Mining probabilistic models learned by EDAs in the optimization of multi-objective problems

Archivo Digital UPM
  • Santana Hermida, Roberto
  • Bielza Lozoya, María Concepción
  • Lozano Ruiz, José Antonio
  • Larrañaga Múgica, Pedro María
One of the uses of the probabilistic models learned by estimation of distribution algorithms is to reveal previous unknown information about the problem structure. In this paper we investigate the mapping between the problem structure and the dependencies captured in the probabilistic models learned by EDAs for a set of multi-objective satisfiability problems. We present and discuss the application of different data mining and visualization techniques for processing and visualizing relevant information from the structure of the learned probabilistic models. We show that also in the case of multi-objective optimization problems, some features of the original problem structure can be translated to the probabilistic models and unveiled by using algorithms that mine the model structures.