A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese
Leandro B. dos Santos, Magali S. Duran, Nathan S. Hartmann, Arnaldo, Candido Jr., Gustavo H. Paetzold, Sandra M. Aluisio

TL;DR
This paper introduces a lightweight regression method to automatically infer psycholinguistic properties for Brazilian Portuguese words using limited features, enabling resource creation for less-resourced languages.
Contribution
It proposes a novel, resource-efficient approach to estimate psycholinguistic properties in Brazilian Portuguese, reducing reliance on costly surveys and extensive datasets.
Findings
Correlations between inferred properties are comparable to existing methods.
The resource covers 26,874 words with multiple psycholinguistic annotations.
The method uses simple features like word length, frequency, and embeddings.
Abstract
Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
