JRC EuroVoc Indexer JEX - A freely available multi-label categorisation tool
Ralf Steinberger, Mohamed Ebrahim, Marco Turchi

TL;DR
The paper introduces JEX, a multilingual, multi-label classification tool for automatically assigning EuroVoc descriptors to documents, enhancing document retrieval and classification in EU institutions.
Contribution
It presents JEX, a new software that learns from labeled data to automatically categorize documents into EuroVoc, supporting 22 languages and offering flexible customization and re-training capabilities.
Findings
Supports 22 EU languages with trained classifiers.
Enables both automatic and interactive categorization.
Provides language-independent feature vectors for other NLP tasks.
Abstract
EuroVoc (2012) is a highly multilingual thesaurus consisting of over 6,700 hierarchically organised subject domains used by European Institutions and many authorities in Member States of the European Union (EU) for the classification and retrieval of official documents. JEX is JRC-developed multi-label classification software that learns from manually labelled data to automatically assign EuroVoc descriptors to new documents in a profile-based category-ranking task. The JEX release consists of trained classifiers for 22 official EU languages, of parallel training data in the same languages, of an interface that allows viewing and amending the assignment results, and of a module that allows users to re-train the tool on their own document collections. JEX allows advanced users to change the document representation so as to possibly improve the categorisation result through linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
