From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?

Sercan Karaka\c{s}; Yusuf \c{S}im\c{s}ek

arXiv:2602.04127·cs.CL·February 5, 2026

From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?

Sercan Karaka\c{s}, Yusuf \c{S}im\c{s}ek

PDF

Open Access 1 Video

TL;DR

This paper investigates the signals that influence the classification of light verb constructions in Turkish, comparing various models and input features to understand what drives their detection and the role of morphology and lexical cues.

Contribution

It systematically analyzes the impact of lemma, morphosyntax, and full input models on LVC classification, highlighting the importance of normalization and lexical identity.

Findings

01

Coarse morphosyntax alone is insufficient for robust detection.

02

Lexical identity supports LVC judgments but is sensitive to normalization.

03

Lemma-only representations depend critically on normalization choices.

Abstract

Light verb constructions (LVCs) are a challenging class of verbal multiword expressions, especially in Turkish, where rich morphology and productive complex predicates create minimal contrasts between idiomatic predicate meanings and literal verb--argument uses. This paper asks what signals drive LVC classification by systematically restricting model inputs. Using UD-derived supervision, we compare lemma-driven baselines (lemma TF--IDF + Logistic Regression; BERTurk trained on lemma sequences), a grammar-only Logistic Regression over UD morphosyntax (UPOS/DEPREL/MORPH), and a full-input BERTurk baseline. We evaluate on a controlled diagnostic set with Random negatives, lexical controls (NLVC), and LVC positives, reporting split-wise performance to expose decision-boundary behavior. Results show that coarse morphosyntax alone is insufficient for robust LVC detection under controlled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling