Buscador | Buscador

Resultados totales (Incluyendo duplicados): 34338
Encontrada(s) 3434 página(s)

CORA.Repositori de Dades de Recerca

doi:10.34810/data332

Dataset. 2012

PANACEA ENVIRONMENT BILINGUAL GLOSSARY EL-EN (GREEK-ENGLISH)

Dublin City University. School of Computing

This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for EL-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.

Proyecto: //

DOI: https://doi.org/10.34810/data332

CORA.Repositori de Dades de Recerca

doi:10.34810/data332

HANDLE: https://doi.org/10.34810/data332

CORA.Repositori de Dades de Recerca

doi:10.34810/data332

PMID: https://doi.org/10.34810/data332

CORA.Repositori de Dades de Recerca

doi:10.34810/data332

Ver en: https://doi.org/10.34810/data332

CORA.Repositori de Dades de Recerca

doi:10.34810/data332

CORA.Repositori de Dades de Recerca

doi:10.34810/data333

Dataset. 2023

PANACEA BILINGUAL GLOSSARY GERMAN-ENGLISH WITH CONTEXTUAL TRANSFER INFORMATION

Linguatec GmbH

Proyecto: //

DOI: https://doi.org/10.34810/data333

CORA.Repositori de Dades de Recerca

doi:10.34810/data333

HANDLE: https://doi.org/10.34810/data333

CORA.Repositori de Dades de Recerca

doi:10.34810/data333

PMID: https://doi.org/10.34810/data333

CORA.Repositori de Dades de Recerca

doi:10.34810/data333

Ver en: https://doi.org/10.34810/data333

CORA.Repositori de Dades de Recerca

doi:10.34810/data333

CORA.Repositori de Dades de Recerca

doi:10.34810/data334

Dataset. 2023

PANACEA ITALIAN AUTOMATICALLY ACQUIRED LEXICON FOR LAB DOMAIN: SUBCATEGORIZATION FRAMES (V-SUBCAT)

Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale "Antonio Zampolli"

Proyecto: //

DOI: https://doi.org/10.34810/data334

CORA.Repositori de Dades de Recerca

doi:10.34810/data334

HANDLE: https://doi.org/10.34810/data334

CORA.Repositori de Dades de Recerca

doi:10.34810/data334

PMID: https://doi.org/10.34810/data334

CORA.Repositori de Dades de Recerca

doi:10.34810/data334

Ver en: https://doi.org/10.34810/data334

CORA.Repositori de Dades de Recerca

doi:10.34810/data334

CORA.Repositori de Dades de Recerca

doi:10.34810/data335

Dataset. 2012

PANACEA ENVIRONMENT CORPUS N-GRAMS ES (SPANISH)

Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

This data set contains Spanish word n-grams and Spanish word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. The data were collected in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064. The n-gram counts were generated from crawled Web pages that were automatically detected to be in the Spanish language and were automatically classified as relevant to the ENV domain. The ENV domain collection used consisted of approximately 49.86 million tokens. Data collection took place in the summer of 2011.

Proyecto: //

DOI: https://doi.org/10.34810/data335

CORA.Repositori de Dades de Recerca

doi:10.34810/data335

HANDLE: https://doi.org/10.34810/data335

CORA.Repositori de Dades de Recerca

doi:10.34810/data335

PMID: https://doi.org/10.34810/data335

CORA.Repositori de Dades de Recerca

doi:10.34810/data335

Ver en: https://doi.org/10.34810/data335

CORA.Repositori de Dades de Recerca

doi:10.34810/data335

CORA.Repositori de Dades de Recerca

doi:10.34810/data336

Dataset. 2023

PANACEA ENVIRONMENT BILINGUAL GLOSSARY FRENCH-TO-ENGLISH

Linguatec GmbH

Proyecto: //

DOI: https://doi.org/10.34810/data336

CORA.Repositori de Dades de Recerca

doi:10.34810/data336

HANDLE: https://doi.org/10.34810/data336

CORA.Repositori de Dades de Recerca

doi:10.34810/data336

PMID: https://doi.org/10.34810/data336

CORA.Repositori de Dades de Recerca

doi:10.34810/data336

Ver en: https://doi.org/10.34810/data336

CORA.Repositori de Dades de Recerca

doi:10.34810/data336

CORA.Repositori de Dades de Recerca

doi:10.34810/data337

Dataset. 2023

PANACEA SPANISH AUTOMATICALLY ACQUIRED LEXICON FOR ENV DOMAIN: SUBCATEGORIZATION FRAMES (V-SUBCAT)

Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

This is a domain-specific lexicon for Spanish subcategorization frames for environment (ENV) domain. This lexicon has been automatically created using the PANACEA web service named tpc_subcat_inductive (http://registry.elda.org/services/223) and the crawled data for this domain and language, previously annotated with Spanish Malt Parser web service (http://registry.elda.org/services/249).

Proyecto: //

DOI: https://doi.org/10.34810/data337

CORA.Repositori de Dades de Recerca

doi:10.34810/data337

HANDLE: https://doi.org/10.34810/data337

CORA.Repositori de Dades de Recerca

doi:10.34810/data337

PMID: https://doi.org/10.34810/data337

CORA.Repositori de Dades de Recerca

doi:10.34810/data337

Ver en: https://doi.org/10.34810/data337

CORA.Repositori de Dades de Recerca

doi:10.34810/data337

CORA.Repositori de Dades de Recerca

doi:10.34810/data338

Dataset. 2012

PANACEA ENGLISH GOLD STANDARD FOR LEXICAL SEMANTIC CLASSIFICATION

Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

We present a set of English gold-standards for different noun classes created in PANACEA to train and test automatic classifiers. To create these gold-standards we used we the data from the SemEval 2007 workshop Task 07: Coarse Grained English All-Words (Navigli et al., 2007). The words used in this task were first automatically tagged with an automatic clustering method (Navigli, 2006) using senses based on the WordNet sense inventory and later manually validated by expert lexicographers. For our experiments, we extracted all of the words from this inventory that contained as their first sense a sense that corresponded to the lexical semantic classes, i.e. “people” in the case of the class HUMAN. These gold-standards were created in the context of PANACEA http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.

Proyecto: //

DOI: https://doi.org/10.34810/data338

CORA.Repositori de Dades de Recerca

doi:10.34810/data338

HANDLE: https://doi.org/10.34810/data338

CORA.Repositori de Dades de Recerca

doi:10.34810/data338

PMID: https://doi.org/10.34810/data338

CORA.Repositori de Dades de Recerca

doi:10.34810/data338

Ver en: https://doi.org/10.34810/data338

CORA.Repositori de Dades de Recerca

doi:10.34810/data338

CORA.Repositori de Dades de Recerca

doi:10.34810/data339

Dataset. 2023

PANACEA SPANISH V-SUBCAT GOLD STANDARD LEXICON LAB DOMAIN

Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

Proyecto: //

DOI: https://doi.org/10.34810/data339

CORA.Repositori de Dades de Recerca

doi:10.34810/data339

HANDLE: https://doi.org/10.34810/data339

CORA.Repositori de Dades de Recerca

doi:10.34810/data339

PMID: https://doi.org/10.34810/data339

CORA.Repositori de Dades de Recerca

doi:10.34810/data339

Ver en: https://doi.org/10.34810/data339

CORA.Repositori de Dades de Recerca

doi:10.34810/data339

CORA.Repositori de Dades de Recerca

doi:10.34810/data33

Dataset. 2024

HOW2SIGN: A LARGE-SCALE MULTIMODAL DATASET FOR CONTINUOUS AMERICAN SIGN LANGUAGE

Cardoso Duarte, Amanda
Giró Nieto, Xavier
Palaskar, Shruti
Ghadiyaram, Deepti
Haan, Kenneth de
Metze, Florian
Torres Viñals, Jordi

How2Sign consists of a parallel corpus of 80 hours of sign language videos (collected with multi-view RGB and depth sensor data) with corresponding speech transcriptions and gloss annotations. In addition, a three-hour subset was further recorded in a geodesic dome setup using hundreds of cameras and sensors, which enables detailed 3D reconstruction and pose estimation and paves the way for vision systems to understand the 3D geometry of sign language.

Proyecto: //

DOI: https://doi.org/10.34810/data33

CORA.Repositori de Dades de Recerca

doi:10.34810/data33

HANDLE: https://doi.org/10.34810/data33

CORA.Repositori de Dades de Recerca

doi:10.34810/data33

PMID: https://doi.org/10.34810/data33

CORA.Repositori de Dades de Recerca

doi:10.34810/data33

Ver en: https://doi.org/10.34810/data33

CORA.Repositori de Dades de Recerca

doi:10.34810/data33

CORA.Repositori de Dades de Recerca

doi:10.34810/data340

Dataset. 2023

PANACEA SPANISH V-SUBCAT GOLD STANDARD LEXICON ENV DOMAIN

Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)

This is a domain-specific gold-standard for Spanish subcategorization frames, in the case, for environment (ENV) domain. This gold-standard was manually developed, choosing a set of 30 verbs and 200 senteces for each verb. For each sentence, the SCFs present for the studied verb were manually annotated. The sentences were selected from crawled Web pages that were automatically detected to be in the Spanish language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011. This gold-standard was created in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.

Proyecto: //

DOI: https://doi.org/10.34810/data340

CORA.Repositori de Dades de Recerca

doi:10.34810/data340

HANDLE: https://doi.org/10.34810/data340

CORA.Repositori de Dades de Recerca

doi:10.34810/data340

PMID: https://doi.org/10.34810/data340

CORA.Repositori de Dades de Recerca

doi:10.34810/data340

Ver en: https://doi.org/10.34810/data340

CORA.Repositori de Dades de Recerca