Semantic clustering of Russian web search results: possibilities and   problems

Andrey Kutuzov

arXiv:1409.1612·cs.CL·October 28, 2014

Semantic clustering of Russian web search results: possibilities and problems

Andrey Kutuzov

PDF

Open Access

TL;DR

This paper explores methods for clustering Russian web search results based on word sense induction using lexical co-occurrence graphs, comparing different models and corpora to improve semantic search accuracy.

Contribution

It introduces novel approaches for clustering search results by word sense in Russian, utilizing large corpora and distributional semantics models.

Findings

01

Effective clustering methods identified for Russian search results

02

Comparison of different corpora shows impact on clustering quality

03

Models of distributional semantics are applied to large linguistic data

Abstract

The paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply this data to cluster Mail.ru Search results according to meanings of the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling