Resultados totales (Incluyendo duplicados): 106
Encontrada(s) 11 página(s)
CORA.Repositori de Dades de Recerca
doi:10.34810/data263
Dataset. 2012

GRAF VERSION OF SPANISH PORTIONS OF WIKIPEDIA CORPUS

  • Universitat Politècnica de Catalunya. Research Group on Natural Language Processing
  • Gemma Boleda
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is the stand-off GrAF version of Spanish portions of the Wikipedia (based on a 2006 dump). This Wikipedia Spanish Corpus contains 257019 articles that contain about 150,1 million words in raw text format. It has been cleaned by erasing disambiguation pages, removing some XML tags and homogenizing lists ending tag. Then, the corpus has been processed for adding structural tagging (head, paragraph, sentence, list, etc.) and morphosyntactic information.

Proyecto: //
DOI: https://doi.org/10.34810/data263
CORA.Repositori de Dades de Recerca
doi:10.34810/data263
HANDLE: https://doi.org/10.34810/data263
CORA.Repositori de Dades de Recerca
doi:10.34810/data263
PMID: https://doi.org/10.34810/data263
CORA.Repositori de Dades de Recerca
doi:10.34810/data263
Ver en: https://doi.org/10.34810/data263
CORA.Repositori de Dades de Recerca
doi:10.34810/data263

CORA.Repositori de Dades de Recerca
doi:10.34810/data264
Dataset. 2012

IULA SPANISH LSP TREEBANK

  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
-

Proyecto: //
DOI: https://doi.org/10.34810/data264
CORA.Repositori de Dades de Recerca
doi:10.34810/data264
HANDLE: https://doi.org/10.34810/data264
CORA.Repositori de Dades de Recerca
doi:10.34810/data264
PMID: https://doi.org/10.34810/data264
CORA.Repositori de Dades de Recerca
doi:10.34810/data264
Ver en: https://doi.org/10.34810/data264
CORA.Repositori de Dades de Recerca
doi:10.34810/data264

CORA.Repositori de Dades de Recerca
doi:10.34810/data265
Dataset. 2012

IULA PENN TREEBANK

  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This treebank consists of a number of Spanish and English sentences that has been manually annotated with syntactical information. The sentences have been choosed from the Penn TreeBank corpus, a resource containing texts from Wall Street Journal and originally compiled by the University of Pennsylvania./nIt contains 805 sentences that have been human translated to Spanish. The original English and the translated Spanish sentences share the same identification number. Sentences in both languages have been processed using the DELPH-IN environment (http://www.delph-in.net/).

Proyecto: //
DOI: https://doi.org/10.34810/data265
CORA.Repositori de Dades de Recerca
doi:10.34810/data265
HANDLE: https://doi.org/10.34810/data265
CORA.Repositori de Dades de Recerca
doi:10.34810/data265
PMID: https://doi.org/10.34810/data265
CORA.Repositori de Dades de Recerca
doi:10.34810/data265
Ver en: https://doi.org/10.34810/data265
CORA.Repositori de Dades de Recerca
doi:10.34810/data265

CORA.Repositori de Dades de Recerca
doi:10.34810/data266
Dataset. 2012

GRAF VERSION OF CATALAN PORTIONS OF WIKIPEDIA CORPUS

  • Universitat Politècnica de Catalunya. Research Group on Natural Language Processing
  • Gemma Boleda
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is the stand-off GrAF version of Catalan portions of the Wikipedia (based on a 2006 dump). This Wikipedia Catalan Corpus contains 122052 articles that contain about 47,3 million words in raw text format. It has been cleaned by erasing disambiguation pages, removing some XML tags and homogenizing lists ending tag. Then, the corpus has been processed for adding structural tagging (head, paragraph, sentence, list, etc.) and morphosyntactic information.

Proyecto: //
DOI: https://doi.org/10.34810/data266
CORA.Repositori de Dades de Recerca
doi:10.34810/data266
HANDLE: https://doi.org/10.34810/data266
CORA.Repositori de Dades de Recerca
doi:10.34810/data266
PMID: https://doi.org/10.34810/data266
CORA.Repositori de Dades de Recerca
doi:10.34810/data266
Ver en: https://doi.org/10.34810/data266
CORA.Repositori de Dades de Recerca
doi:10.34810/data266

CORA.Repositori de Dades de Recerca
doi:10.34810/data268
Dataset. 2012

IULA SPANISH-ENGLISH TECHNICAL CORPUS

  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
The corpus consists of a number of specialized texts (Law, Economics, Medicine, Environment and Computer Science domains) available in both Spanish and English languages. This LSP corpus has been compiled with articles from specialized Publications, PhD theses, etc./nIt contains about a total of about 2,1 M words in 127 documents in each language.

Proyecto: //
DOI: https://doi.org/10.34810/data268
CORA.Repositori de Dades de Recerca
doi:10.34810/data268
HANDLE: https://doi.org/10.34810/data268
CORA.Repositori de Dades de Recerca
doi:10.34810/data268
PMID: https://doi.org/10.34810/data268
CORA.Repositori de Dades de Recerca
doi:10.34810/data268
Ver en: https://doi.org/10.34810/data268
CORA.Repositori de Dades de Recerca
doi:10.34810/data268

CORA.Repositori de Dades de Recerca
doi:10.34810/data269
Dataset. 2012

ENGLISH-GALICIAN CLUVI DICTIONARY

  • Universidade de Vigo. Grupo de investigación TALG
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is the LMF version of the English-Galician CLUVI Dictionary developed under the direction of Xavier Gómez Guinovart (2005-2012) from parallel texts in the CLUVI Corpus of the University of Vigo.

Proyecto: //
DOI: https://doi.org/10.34810/data269
CORA.Repositori de Dades de Recerca
doi:10.34810/data269
HANDLE: https://doi.org/10.34810/data269
CORA.Repositori de Dades de Recerca
doi:10.34810/data269
PMID: https://doi.org/10.34810/data269
CORA.Repositori de Dades de Recerca
doi:10.34810/data269
Ver en: https://doi.org/10.34810/data269
CORA.Repositori de Dades de Recerca
doi:10.34810/data269

CORA.Repositori de Dades de Recerca
doi:10.34810/data270
Dataset. 2012

CORPUS92 CORPUS

  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
-

Proyecto: //
DOI: https://doi.org/10.34810/data270
CORA.Repositori de Dades de Recerca
doi:10.34810/data270
HANDLE: https://doi.org/10.34810/data270
CORA.Repositori de Dades de Recerca
doi:10.34810/data270
PMID: https://doi.org/10.34810/data270
CORA.Repositori de Dades de Recerca
doi:10.34810/data270
Ver en: https://doi.org/10.34810/data270
CORA.Repositori de Dades de Recerca
doi:10.34810/data270

CORA.Repositori de Dades de Recerca
doi:10.34810/data271
Dataset. 2012

LMF VERSION OF THE SENSEM SPANISH DATA BASE

  • Grup de Recerca Interuniversitari en Aplicacions Lingüístiques (GRIAL)
  • Fernandez Montraveta, Ana
  • Vázquez, Glòria
  • Castellón, Irene
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the semantic and syntactic levels was created. The source corpus is made up of around 13 million words extracted from the online versions of a Spanish newspaper. From this corpus, 25.000 sentences have been randomly selected, 100 for each of the 250 more frequent verbs in current Spanish. Each sentence has been labeled according to the verb sense it exemplifies, the type of complements it takes (arguments or adjunts), their syntactic category and function, and finally each argument has been labelled with a semantic role. The sentence has also been annotated as to its semantics both in relation with aspectual information and the type of construction being expressed. From this annotated corpus a lexical data base of verbs was created in which all the previous information will be recollected. The unit of description of the verbs is the sense. In the description of the verbs, argument structure is included, incorporating subcategorization patterns, with the information of frequency of them, semantic roles and information regarding sentence semantics. The lexicon and the corpus are associated at sense level and together shape up what we call the data bank of the sentential semantic of the Spanish verbs. Both resources are available via web and will form a very important source of linguistic information which we hope will be of utility in different areas of the natural language processing and linguistic research in general. The LMF conversion has been done by the Universitat Pompeu Fabra.

Proyecto: //
DOI: https://doi.org/10.34810/data271
CORA.Repositori de Dades de Recerca
doi:10.34810/data271
HANDLE: https://doi.org/10.34810/data271
CORA.Repositori de Dades de Recerca
doi:10.34810/data271
PMID: https://doi.org/10.34810/data271
CORA.Repositori de Dades de Recerca
doi:10.34810/data271
Ver en: https://doi.org/10.34810/data271
CORA.Repositori de Dades de Recerca
doi:10.34810/data271

CORA.Repositori de Dades de Recerca
doi:10.34810/data272
Dataset. 2012

SPANISH LMF PAROLE/SIMPLE LEXICON

  • Universitat de Barcelona. Grup de recerca GILCUB
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is the LMF version of the Spanish Parole-Simple lexicon. The original PAROLE lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portugese, Spanish and Swedish. The goal of SIMPLE project was to add semantic information, selected for its relevance for LE applications, to the set of harmonised multifunctional lexica built for 12 European languages by the PAROLE consortium. PAROLE +SIMPLE lexicons contain morphological, syntactic and semantic information, organised according to a common model and to common linguistic specifications.

Proyecto: //
DOI: https://doi.org/10.34810/data272
CORA.Repositori de Dades de Recerca
doi:10.34810/data272
HANDLE: https://doi.org/10.34810/data272
CORA.Repositori de Dades de Recerca
doi:10.34810/data272
PMID: https://doi.org/10.34810/data272
CORA.Repositori de Dades de Recerca
doi:10.34810/data272
Ver en: https://doi.org/10.34810/data272
CORA.Repositori de Dades de Recerca
doi:10.34810/data272

CORA.Repositori de Dades de Recerca
doi:10.34810/data273
Dataset. 2012

FRENCH-SPANISH LMF APERTIUM BILINGUAL DICTIONARY

  • Prompsit Language Engineering, S.L
  • Eleka Ingenieritza Linguistikoa S.L
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
-

Proyecto: //
DOI: https://doi.org/10.34810/data273
CORA.Repositori de Dades de Recerca
doi:10.34810/data273
HANDLE: https://doi.org/10.34810/data273
CORA.Repositori de Dades de Recerca
doi:10.34810/data273
PMID: https://doi.org/10.34810/data273
CORA.Repositori de Dades de Recerca
doi:10.34810/data273
Ver en: https://doi.org/10.34810/data273
CORA.Repositori de Dades de Recerca
doi:10.34810/data273

Buscador avanzado