Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database
Edgar Altszyler, Mariano Sigman, Sidarta Ribeiro, Diego Fern\'andez, Slezak

TL;DR
This study compares LSA and Word2vec embeddings in small corpora, demonstrating LSA's superior performance in capturing semantic patterns in dream reports, which can aid psychological research.
Contribution
It provides a comparative analysis of LSA and Word2vec in small datasets, highlighting LSA's effectiveness for semantic analysis in dreams reports.
Findings
LSA outperforms Skip-gram in small corpus semantic tests
LSA captures relevant word associations in dream reports
LSA can be used for exploring word relationships in psychology studies
Abstract
Word embeddings have been extensively studied in large text datasets. However, only a few studies analyze semantic representations of small corpora, particularly relevant in single-person text production studies. In the present paper, we compare Skip-gram and LSA capabilities in this scenario, and we test both techniques to extract relevant semantic patterns in single-series dreams reports. LSA showed better performance than Skip-gram in small size training corpus in two semantic tests. As a study case, we show that LSA can capture relevant words associations in dream reports series, even in cases of small number of dreams or low-frequency words. We propose that LSA can be used to explore words associations in dreams reports, which could bring new insight into this classic research area of psychology
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
