Exploring Italian sentence embeddings properties through multi-tasking
Vivi Nastase, Giuseppe Samo, Chunyang Jiang, Paola Merlo

TL;DR
This paper examines how Italian sentence embeddings from large language models encode syntactic and semantic information by using multi-task learning on synthetic data, revealing that abstract linguistic notions are not strongly represented.
Contribution
It introduces a multi-task framework to analyze the encoding of linguistic information in Italian sentence embeddings from pre-trained models, highlighting limitations in representing abstract linguistic concepts.
Findings
Sentence embeddings encode different clues for various tasks.
Pre-trained embeddings lack strong representation of abstract linguistic notions.
Multi-task approach reveals encoding differences across tasks.
Abstract
We investigate to what degree existing LLMs encode abstract linguistic information in Italian in a multi-task setting. We exploit curated synthetic data on a large scale -- several Blackbird Language Matrices (BLMs) problems in Italian -- and use them to study how sentence representations built using pre-trained language models encode specific syntactic and semantic information. We use a two-level architecture to model separately a compression of the sentence embeddings into a representation that contains relevant information for a task, and a BLM task. We then investigate whether we can obtain compressed sentence representations that encode syntactic and semantic information relevant to several BLM tasks. While we expected that the sentence structure -- in terms of sequence of phrases/chunks -- and chunk properties could be shared across tasks, performance and error analysis show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
