Encontrada(s) 1 página(s)
HESML V1R5 JAVA SOFTWARE LIBRARY OF ONTOLOGY-BASED SEMANTIC SIMILARITY MEASURES AND INFORMATION CONTENT MODELS
- Lastra-Díaz, Juan J.
- Lara-Clares, Alicia
- Garcia-Serrano, Ana
WORD SIMILARITY BENCHMARKS OF RECENT WORD EMBEDDING MODELS AND ONTOLOGY-BASED SEMANTIC SIMILARITY MEASURES
- Lastra-Díaz, Juan J.
- Goikoetxea, Josu
- Hadj Taieb, Mohamed Ali
- Garcia-Serrano, Ana
- Ben Aouicha, Mohamed
- Agirre, Eneko
HESML V2R1 JAVA SOFTWARE LIBRARY OF SEMANTIC SIMILARITY MEASURES FOR THE BIOMEDICAL DOMAIN
- Lara-Clares, Alicia
- Lastra-Díaz, Juan J.
- Garcia-Serrano, Ana
QUAM-AFM LITE
- Carracedo-Cosme, Jaime
- Romero-Muñíz, Carlos
- Pou, Pablo
- Pérez, Rubén
QUAM–AFM Lite is the scaled-down version of QUAM-AFM, the largest dataset of simulated Atomic Force Microscopy (AFM) images. This reduced version was generated from a selection of 1,755 molecules that span the most relevant bonding structures and chemical species in organic chemistry. Similar to the extended version, QUAM-AFM Lite contains, for each molecule, 24 3D image stacks, each consisting of constant-height images simulated for 10 tip-sample distances (in the relevant imaging range and spanning a variation of 1 Å (0.1 nanometers)) with one of the 24 different combination of AFM operational parameters, resulting in a total of 421,200 images with a resolution of 256x256 pixels.
The operational parameters include six different values for the cantilever oscillation amplitude (0.40, 0.60, 0.80, 1.00, 1.20, 1.40Å), 4 values of the elastic constant describing the tilting of the CO tip (0.40, 0.60, 0.80 and 1.00 N/m). The first parameter is freely chosen in the experiments in order to enhance different features of the image, while the last one reflects differences in the attachment of the CO molecule to the metal tip that are routinely observed and has been characterized in the experiments.
The data provided for each molecule includes, besides a set of AFM images, the ball–and–stick depiction, the IUPAC name, the chemical formula, the atomic coordinates, and the map of atom heights. In order to simplify the use of the collection as a source of information, we have developed a Graphical User Interface (GUI) that allows the search for structures by CID number, IUPAC name or chemical formula.
This dataset arises as a product of the research carried out in collaboration between Quasar Science Resources S.L. (https://quasarsr.com) and the Scanning Probe Microscopy Theory & Nanomechanics Research Group (SPMTH) (http://www.uam.es/spmth) at the Universidad Autónoma de Madrid (UAM), funded by the Comunidad de Madrid under the Industrial Doctorate Programme 2017 (project reference IND2017/IND-7793).
The main goal of this dataset is to provide a simplified version of QUAM-AFM that allows to analyse the distribution of information and/or the graphical interface without the need for a full download. The extended version, QUAM-AFM, supports the development of deep learning methods for molecular identification through AFM imaging. Once this project has concluded, this dataset is made freely accessible in order to facilitate and to promote research in a range of fields including Atomic Force Microscopy, on-surface synthesis and deep learning applications.
REPRODUCIBLE EXPERIMENTS ON WORD AND SENTENCE SIMILARITY MEASURES FOR THE BIOMEDICAL DOMAIN
- Lara-Clares, Alicia
- Lastra-Díaz, Juan J.
- Garcia-Serrano, Ana
This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our main paper, which is a reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most of current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. This dataset sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments in any software platform supported by Docker, such as all Linux-based operating systems, Windows or MacOS. Our benchmark program is distributed with the UMLS SNOMED-CT and MeSH ontologies by courtesy of the US National Library of Medicine (NLM), as well as all needed software components with the aim of making the setup process easier. Our Docker image provides an exact virtual replica of the machine in which we ran our experiments, thus removing the need to carry-out any tedious setup process, such as the setup of the Python virtual environments and other software components.
HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the two mains HESML papers [17] as attribution requirement. However, HESML distribution also includes other datasets, databases or data files whose use require the attribution acknowledgement by any user of HEMSL. Thus, we urge to the HESML users to fulfill with licensing terms related to other resources distributed with the library as detailed in its companion release notes.
REPRODUCIBILITY DATASET FOR A BENCHMARK OF BIOMEDICAL SEMANTIC MEASURES LIBRARIES
- Lastra-Díaz, Juan J.
- Lara-Clares, Alicia
- Garcia-Serrano, Ana
QUAM-AFM
- Carracedo-Cosme, Jaime
- Romero-Muñíz, Carlos
- Pou, Pablo
- Pérez, Rubén
QUAM–AFM is the largest dataset of simulated Atomic Force Microscopy (AFM) images generated from a selection of 685,513 molecules that span the most relevant bonding structures and chemical species in organic chemistry. QUAM-AFM contains, for each molecule, 24 3D image stacks, each consisting of constant-height images simulated for 10 tip-sample distances (in the relevant imaging range and spanning a variation of 1 Å (0.1 nanometers)) with one of the 24 different combination of AFM operational parameters, resulting in a total of 165 million images with a resolution of 256x256 pixels. The 3D stacks are especially appropriate to tackle the goal of chemical identification within AFM experiments by using deep learning techniques.
The operational parameters include six different values for the cantilever oscillation amplitude (0.40, 0.60, 0.80, 1.00, 1.20, 1.40 Å), 4 values of the elastic constant describing the tilting of the CO tip (0.40, 0.60, 0.80 and 1.00 N/m). The first parameter is freely chosen in the experiments in order to enhance different features of the image, while the last one reflects differences in the attachment of the CO molecule to the metal tip that are routinely observed and has been characterized in the experiments.
The data provided for each molecule includes, besides a set of AFM images, the ball–and–stick depiction, the IUPAC name, the chemical formula, the atomic coordinates, and the map of atom heights. In order to simplify the use of the collection as a source of information, we have developed a Graphical User Interface (GUI) that allows the search for structures by CID number, IUPAC name or chemical formula.
This dataset arises as a product of the research carried out in collaboration between Quasar Science Resources S.L. (https://quasarsr.com) and the Scanning Probe Microscopy Theory & Nanomechanics Research Group (SPMTH) (http://www.uam.es/spmth) at the Universidad Autónoma de Madrid (UAM), funded by the Comunidad de Madrid under the Industrial Doctorate Programme 2017 (project reference IND2017/IND-7793).
The main goal of this dataset is to support the development of deep learning methods for molecular identification through AFM imaging. Once this project has concluded, this dataset is made freely accessible in order to facilitate and to promote research in a range of fields including Atomic Force Microscopy, on-surface synthesis and deep learning applications.