Semantic clustering of Russian web search results: possibilities and problems
Andrey Kutuzov

TL;DR
This paper explores methods for clustering Russian web search results based on word sense induction using lexical co-occurrence graphs, comparing different models and corpora to improve semantic search accuracy.
Contribution
It introduces novel approaches for clustering search results by word sense in Russian, utilizing large corpora and distributional semantics models.
Findings
Effective clustering methods identified for Russian search results
Comparison of different corpora shows impact on clustering quality
Models of distributional semantics are applied to large linguistic data
Abstract
The paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply this data to cluster Mail.ru Search results according to meanings of the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
