Dataset. 2013

PANACEA English automatically acquired lexicon for ENV domain: Subcategorization Frames and Lexical Semantic classes for nouns

CORA.Repositori de Dades de Recerca
doi:10.34810/data375
CORA.Repositori de Dades de Recerca
  • University of Cambridge. Department of Theoretical and Applied Linguistics
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is a domain-specific lexicon for English for environment (ENV) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011.
 
DOI: https://doi.org/10.34810/data375
CORA.Repositori de Dades de Recerca
doi:10.34810/data375

HANDLE: https://doi.org/10.34810/data375
CORA.Repositori de Dades de Recerca
doi:10.34810/data375
 
Ver en: https://doi.org/10.34810/data375
CORA.Repositori de Dades de Recerca
doi:10.34810/data375

CORA.Repositori de Dades de Recerca
doi:10.34810/data363
Dataset. 2023

PANACEA ENGLISH AUTOMATICALLY ACQUIRED LEXICON FOR ENV DOMAIN: SUBCATEGORIZATION FRAMES (V-SUBCAT)

CORA.Repositori de Dades de Recerca
  • University of Cambridge. Department of Theoretical and Applied Linguistics
-




CORA.Repositori de Dades de Recerca
doi:10.34810/data364
Dataset. 2023

PANACEA ENGLISH AUTOMATICALLY ACQUIRED LEXICON FOR LAB DOMAIN: SUBCATEGORIZATION FRAMES (V-SUBCAT)

CORA.Repositori de Dades de Recerca
  • University of Cambridge. Department of Theoretical and Applied Linguistics
This lexicon was produced using an inductive SCF classifier, the tpc_subcat_inductive webservice in the PANACEA project. The lexicon was automatically produced from the PANACEA MCv2 crawled corpus, by parsing the data with the RASP parser (Third Release, Open-Source Version, February 2001, available from http://ilexir.co.uk; see also E. Briscoe, J. Carroll, and R. Watson, 2006, The Second Release of the RASP System, in Proceedings of COLING/ACL Interactive Presentation Sessions), and then processing the parsed data with tpc_subcat_inductive. Only verb lemmas with at least 200 instances in MCv2 were retained.




CORA.Repositori de Dades de Recerca
doi:10.34810/data375
Dataset. 2013

PANACEA ENGLISH AUTOMATICALLY ACQUIRED LEXICON FOR ENV DOMAIN: SUBCATEGORIZATION FRAMES AND LEXICAL SEMANTIC CLASSES FOR NOUNS

CORA.Repositori de Dades de Recerca
  • University of Cambridge. Department of Theoretical and Applied Linguistics
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is a domain-specific lexicon for English for environment (ENV) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the ENV domain. Data collection took place in the summer of 2011.




CORA.Repositori de Dades de Recerca
doi:10.34810/data378
Dataset. 2013

PANACEA ENGLISH AUTOMATICALLY ACQUIRED LEXICON FOR LAB DOMAIN: SUBCATEGORIZATION FRAMES AND LEXICAL SEMANTIC CLASSES FOR NOUNS

CORA.Repositori de Dades de Recerca
  • University of Cambridge. Department of Theoretical and Applied Linguistics
  • Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
This is a domain-specific lexicon for English for labour (LAB) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns. This lexicon has been automatically created using PANACEA webservices using crawled data. The crawled data was obtained crawling web pages that were automatically detected to be in the English language and were automatically classified as relevant to the LAB domain. Data collection took place in the summer of 2011.