Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning
Yiqun Sun, Qiang Huang, Anthony K. H. Tung, Jun Yu

TL;DR
This paper advocates for a shift in text embedding research towards capturing implicit semantics, emphasizing deeper linguistic understanding over surface-level meaning to improve interpretive NLP tasks.
Contribution
It highlights the current limitations of embeddings in modeling implicit semantics and proposes a paradigm change with new data, benchmarks, and objectives.
Findings
State-of-the-art models perform poorly on implicit semantics tasks.
Current benchmarks favor surface-level semantic capture.
A pilot study shows marginal improvements over simple baselines.
Abstract
This position paper argues that the text embedding research community should move beyond surface meaning and embrace implicit semantics as a central modeling goal. Text embedding models have become foundational in modern NLP, powering a wide range of applications and drawing increasing research attention. Yet, much of this progress remains narrowly focused on surface-level semantics. In contrast, linguistic theory emphasizes that meaning is often implicit, shaped by pragmatics, speaker intent, and sociocultural context. Current embedding models are typically trained on data that lacks such depth and evaluated on benchmarks that reward the capture of surface meaning. As a result, they struggle with tasks requiring interpretive reasoning, speaker stance, or social meaning. Our pilot study highlights this gap, showing that even state-of-the-art models perform only marginally better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
