To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features
Caroline Pasquer (1), Agata Savary (1), Jean-Yves Antoine (1), Carlos, Ramisch (2), Nicolas Labroche (1), Arnaud Giacometti (1) ((1) University of, Tours, France, (2) Aix Marseille Univ, Universit\'e de Toulon, CNRS, LIS,, Marseille, France)

TL;DR
This paper investigates features for identifying verbal multiword expressions (VMWEs), finding that a simple frequency-based feature selection method with an SVM classifier outperforms more complex methods and recent systems.
Contribution
It introduces a simple frequency-based feature selection approach that improves VMWE identification performance over standard methods and state-of-the-art systems.
Findings
Frequency-based feature selection outperforms Chi-squared, information gain, and decision trees.
A 6-feature SVM classifier surpasses recent shared task systems.
Surface variability in VMWEs can be effectively captured with minimal features.
Abstract
Automatic identification of mutiword expressions (MWEs) is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classifier using the optimal set of only 6 features outperforms the best systems from a recent shared task on the French…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
MethodsFeature Selection · Support Vector Machine
