Ambiguity and Incomplete Information in Categorical Models of Language
Dan Marsden (University of Oxford)

TL;DR
This paper develops a categorical framework to model ambiguity and incomplete information in natural language, extending existing models to handle various forms of imprecise data while preserving key structural properties.
Contribution
It introduces a systematic method to enrich dagger compact closed categories for modeling linguistic ambiguity and partial information, unifying different approaches through monads and subconvex algebra enrichment.
Findings
Enrichment of categories captures ambiguity and incompleteness.
Constructs a model with convex combinations of relations.
Shows subconvex algebra enrichment covers all considered effects.
Abstract
We investigate notions of ambiguity and partial information in categorical distributional models of natural language. Probabilistic ambiguity has previously been studied using Selinger's CPM construction. This construction works well for models built upon vector spaces, as has been shown in quantum computational applications. Unfortunately, it doesn't seem to provide a satisfactory method for introducing mixing in other compact closed categories such as the category of sets and binary relations. We therefore lack a uniform strategy for extending a category to model imprecise linguistic information. In this work we adopt a different approach. We analyze different forms of ambiguous and incomplete information, both with and without quantitative probabilistic data. Each scheme then corresponds to a suitable enrichment of the category in which we model language. We view different monads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
