Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data: the PyMUSAS framework for Multilingual Semantic Annotation
Andrew Moore, Paul Rayson, Dawn Archer, Tim Czerniak, Dawn Knight, Daisy Lal, Gear\'oid \'O Donnchadha, M\'iche\'al \'O Meachair, Scott Piao, Elaine U\'i Dhonnchadha, Johanna Vuorinen, Yan Yabo, Xiaobin Yang

TL;DR
This paper introduces a hybrid rule-based and neural network semantic tagger for multilingual semantic annotation, utilizing a new silver standard dataset and extensive evaluation across five languages.
Contribution
It presents the creation of a large silver standard dataset for USAS, and demonstrates how neural models can enhance rule-based semantic tagging in multiple languages.
Findings
Neural models outperform rule-based systems in semantic tagging accuracy.
The hybrid approach improves multilingual semantic annotation performance.
Open resources including datasets and code are released for community use.
Abstract
Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
