RusLICA: A Russian-Language Platform for Automated Linguistic Inquiry and Category Analysis
Elina Sigdel, Anastasia Panfilova

TL;DR
This paper presents RusLICA, a novel Russian-language platform for automated linguistic analysis that adapts LIWC methodology to Russian, incorporating linguistic, statistical, and model-based features for psycholinguistic research.
Contribution
It introduces a Russian-specific LIWC adaptation with 96 categories, combining linguistic resources and pre-trained language models for comprehensive text analysis.
Findings
Developed a Russian LIWC dictionary with 96 categories
Implemented a web-based analyzer as part of RusLICA
Mapped lemmas to 42 psycholinguistic categories
Abstract
Defining psycholinguistic characteristics in written texts is a task gaining increasing attention from researchers. One of the most widely used tools in the current field is Linguistic Inquiry and Word Count (LIWC) that originally was developed to analyze English texts and translated into multiple languages. Our approach offers the adaptation of LIWC methodology for the Russian language, considering its grammatical and cultural specificities. The suggested approach comprises 96 categories, integrating syntactic, morphological, lexical, general statistical features, and results of predictions obtained using pre-trained language models (LMs) for text analysis. Rather than applying direct translation to existing thesauri, we built the dictionary specifically for the Russian language based on the content from several lexicographic resources, semantic dictionaries and corpora. The paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Text Readability and Simplification · Authorship Attribution and Profiling
