Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain
Andrew Yates, Michael Unterkalmsteiner

TL;DR
This paper adapts a relevance-ranked synonym discovery method to identify Swedish domain-specific synonyms in the building construction sector, demonstrating the effectiveness of learning to rank approaches and feature analysis in a new language and domain.
Contribution
It extends prior synonym discovery work to Swedish and construction domain, introducing new features and evaluating their impact with a learning to rank framework.
Findings
Learning to rank approach outperforms baseline methods
FastText embeddings serve as a strong baseline
Feature analysis reveals domain and language-specific differences
Abstract
Domain-specific synonyms occur in many specialized search tasks, such as when searching medical documents, legal documents, and software engineering artifacts. We replicate prior work on ranking domain-specific synonyms in the consumer health domain by applying the approach to a new language and domain: identifying Swedish language synonyms in the building construction domain. We chose this setting because identifying synonyms in this domain is helpful for downstream systems, where different users may query for documents (e.g., engineering requirements) using different terminology. We consider two new features inspired by the change in language and methodological advances since the prior work's publication. An evaluation using data from the building construction domain supports the finding from the prior work that synonym discovery is best approached as a learning to rank task in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText
