Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics
Edward Grefenstette

TL;DR
This thesis advances compositional distributional semantics by extending categorical models, implementing them practically, and demonstrating their competitive performance in natural language understanding tasks.
Contribution
It introduces new syntactic and semantic extensions to the DisCoCat framework, along with learning procedures for concrete models and empirical evaluations showing improved performance.
Findings
Models outperform existing approaches in NLP tasks
Extended framework incorporates diverse syntactic formalisms
Category theory provides a solid foundation for linguistic modeling
Abstract
This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce representations for larger units of text by composing the representations of smaller units of text. This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
