Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data
Susanne Arndt, Dieter Schn\"app

TL;DR
Harbsafe-162 is a specialized dataset designed to evaluate semantic models for terminological data in the electrotechnical domain, aiding standardization and harmonization efforts.
Contribution
The paper introduces Harbsafe-162, a domain-specific dataset for intrinsic evaluation of semantic models applied to terminological entries in standards.
Findings
Distributional semantic models perform satisfactorily on domain-specific terminological data.
The dataset supports evaluation of models for harmonization of technical standards.
Intrinsic evaluation via similarity ratings is feasible for complex lexical data.
Abstract
The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universit\"at Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicslinguistics and terminology studies · Natural Language Processing Techniques · Lexicography and Language Studies
