From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?
Sercan Karaka\c{s}, Yusuf \c{S}im\c{s}ek

TL;DR
This paper investigates the signals that influence the classification of light verb constructions in Turkish, comparing various models and input features to understand what drives their detection and the role of morphology and lexical cues.
Contribution
It systematically analyzes the impact of lemma, morphosyntax, and full input models on LVC classification, highlighting the importance of normalization and lexical identity.
Findings
Coarse morphosyntax alone is insufficient for robust detection.
Lexical identity supports LVC judgments but is sensitive to normalization.
Lemma-only representations depend critically on normalization choices.
Abstract
Light verb constructions (LVCs) are a challenging class of verbal multiword expressions, especially in Turkish, where rich morphology and productive complex predicates create minimal contrasts between idiomatic predicate meanings and literal verb--argument uses. This paper asks what signals drive LVC classification by systematically restricting model inputs. Using UD-derived supervision, we compare lemma-driven baselines (lemma TF--IDF + Logistic Regression; BERTurk trained on lemma sequences), a grammar-only Logistic Regression over UD morphosyntax (UPOS/DEPREL/MORPH), and a full-input BERTurk baseline. We evaluate on a controlled diagnostic set with Random negatives, lexical controls (NLVC), and LVC positives, reporting split-wise performance to expose decision-boundary behavior. Results show that coarse morphosyntax alone is insufficient for robust LVC detection under controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
