High-dimensional distributed semantic spaces for utterances
Jussi Karlgren, Pentti Kanerva

TL;DR
This paper introduces a high-dimensional, mathematically principled model for representing diverse linguistic features of utterances and texts, bridging symbolic and continuous representations for improved language processing.
Contribution
It extends Random Indexing to create a unified, fixed-dimensional vector space for various linguistic features, enabling efficient integration of symbolic and machine learning methods.
Findings
Successfully represents a broad range of linguistic features in a common vector space
Demonstrates the model's applicability to different linguistic data types
Provides a computationally feasible approach for linguistic feature integration
Abstract
High-dimensional distributed semantic spaces have proven useful and effective for aggregating and processing visual, auditory, and lexical information for many tasks related to human-generated data. Human language makes use of a large and varying number of features, lexical and constructional items as well as contextual and discourse-specific data of various types, which all interact to represent various aspects of communicative information. Some of these features are mostly local and useful for the organisation of e.g. argument structure of a predication; others are persistent over the course of a discourse and necessary for achieving a reasonable level of understanding of the content. This paper describes a model for high-dimensional representation for utterance and text level data including features such as constructions or contextual data, based on a mathematically principled and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
