Multilingual Multiword Expressions
Lahari Poddar

TL;DR
This paper presents a semi-supervised, multilingual approach combining rule-based and statistical methods to automatically extract and validate multiword expressions across English and Indian languages.
Contribution
It introduces a novel extraction pipeline that integrates linguistic rules, semantic analysis, and statistical techniques for identifying multiword expressions in multiple languages.
Findings
Effective extraction of various types of multiword expressions.
Integration of linguistic filters with semantic validation improves accuracy.
System supports multilingual processing including Indian languages.
Abstract
The project aims to provide a semi-supervised approach to identify Multiword Expressions in a multilingual context consisting of English and most of the major Indian languages. Multiword expressions are a group of words which refers to some conventional or regional way of saying things. If they are literally translated from one language to another the expression will lose its inherent meaning. To automatically extract multiword expressions from a corpus, an extraction pipeline have been constructed which consist of a combination of rule based and statistical approaches. There are several types of multiword expressions which differ from each other widely by construction. We employ different methods to detect different types of multiword expressions. Given a POS tagged corpus in English or any Indian language the system initially applies some regular expression filters to narrow down…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
