ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala
N. Astrakhantsev

TL;DR
ATR4S is an open-source Scala toolkit that implements over 15 methods for automatic terminology recognition, facilitating comprehensive evaluation and comparison across diverse datasets.
Contribution
The paper introduces ATR4S, a scalable, modular toolkit with extensive method implementations and a thorough comparison of state-of-the-art ATR methods on multiple datasets.
Findings
No single method outperforms others across all datasets
Existing tools lack the best ATR methods
ATR4S enables reliable benchmarking of ATR techniques
Abstract
Automatically recognized terminology is widely used for various domain-specific texts processing tasks, such as machine translation, information retrieval or sentiment analysis. However, there is still no agreement on which methods are best suited for particular settings and, moreover, there is no reliable comparison of already developed methods. We believe that one of the main reasons is the lack of state-of-the-art methods implementations, which are usually non-trivial to recreate. In order to address these issues, we present ATR4S, an open-source software written in Scala that comprises more than 15 methods for automatic terminology recognition (ATR) and implements the whole pipeline from text document preprocessing, to term candidates collection, term candidates scoring, and finally, term candidates ranking. It is highly scalable, modular and configurable tool with support of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
