Dataset. 2023

PANACEA Labour Bilingual Glossary FR-EN (French-English)

CORA.Repositori de Dades de Recerca
doi:10.34810/data349
CORA.Repositori de Dades de Recerca
  • Dublin City University. School of Computing
This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for for FR-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.
 
DOI: https://doi.org/10.34810/data349
CORA.Repositori de Dades de Recerca
doi:10.34810/data349

HANDLE: https://doi.org/10.34810/data349
CORA.Repositori de Dades de Recerca
doi:10.34810/data349
 
Ver en: https://doi.org/10.34810/data349
CORA.Repositori de Dades de Recerca
doi:10.34810/data349

CORA.Repositori de Dades de Recerca
doi:10.34810/data349
Dataset. 2023

PANACEA LABOUR BILINGUAL GLOSSARY FR-EN (FRENCH-ENGLISH)

CORA.Repositori de Dades de Recerca
  • Dublin City University. School of Computing
This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for for FR-EN language pair. The tables are firstly filtered using part of speech tag sequences for each language so that entries with unsuitable part of speech sequences are filtered out. Then, feature scores from the phrase table are combined in a log-linear model to score each entry. The user specifies how large the output glossary should be (relative to the input) and the bottom ranking entries are discarded to produce the desired size glossary.