Resultados totales (Incluyendo duplicados): 1
Encontrada(s) 1 página(s)
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
doi:10.21950/AQ1CVX
Dataset. 2018

WORD SIMILARITY BENCHMARKS OF RECENT WORD EMBEDDING MODELS AND ONTOLOGY-BASED SEMANTIC SIMILARITY MEASURES

  • Lastra-Díaz, Juan J.
  • Goikoetxea, Josu
  • Hadj Taieb, Mohamed Ali
  • Garcia-Serrano, Ana
  • Ben Aouicha, Mohamed
  • Agirre, Eneko
This dataset is a companion reproducibility package of the related paper submitted for publication, whose aim is to allow the exact replication of a very large experimental survey on word similarity between the families of ontology-based semantic similarity measures and word embedding models as detailed in ‘appendix-reproducible-experiments.pdf’ file. Our experiments are based on the evaluation of all methods with the HESML V1R4 semantic measures library and the recording of these experiments with Reprozip. HESML is a self-contained Java software library of semantic measures based on WordNet whose latest version, called HESML V1R4, also supports the evaluation of pre-trained word embedding files. HESML is a self-contained experimentation platform on word similarity which is especially well suited to run large experimental surveys by supporting the execution of automatic reproducible experiment files on word similarity based on a XML-based file format called (*.exp). On the other hand, ReproZip is a virtualisation tool whose aim is to warrant the exact replication of experimental results onto a different system from that originally used in their creation. Reprozip captures all the program dependencies and is able to reproduce the packaged experiments on any host platform, regardless of the hardware and software configuration used in their creation. Thus, ReproZip warrants the reproduction of the experiments introduced herein in the long-term. Finally, other very valuable feature of Reprozip is that it allows to modify the input files of any Reprozip package with the aim of evaluating a set of experiments using originally unconsidered methods, configuration parameters or datasets. This dataset contains a Reprozip package to reproduce our experiments in any supported platform, as well as all pre-trained word embedding models and word similarity datasets used in our experiments. In addition, this dataset also contains all raw output files generated by our experiments, and a R script file to generate all output processed files corresponding to the data tables in our related paper. Finally, we provide a very detailed experimental setup in the aforementioned PDF file to allow all our experiments to be reproduced exactly.

Proyecto: //
DOI: https://doi.org/10.21950/AQ1CVX
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
doi:10.21950/AQ1CVX
HANDLE: https://doi.org/10.21950/AQ1CVX
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
doi:10.21950/AQ1CVX
PMID: https://doi.org/10.21950/AQ1CVX
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
doi:10.21950/AQ1CVX
Ver en: https://doi.org/10.21950/AQ1CVX
e-cienciaDatos, Repositorio de Datos del Consorcio Madroño
doi:10.21950/AQ1CVX

Buscador avanzado