Efficient Multilingual Name Type Classification Using Convolutional Networks
Davor Lauc

TL;DR
This paper introduces a CNN-based model, Onomas-CNN X, for multilingual name classification that is highly efficient and accurate, outperforming transformer models in speed and energy consumption on CPU hardware.
Contribution
The paper presents a novel CNN architecture for multilingual name classification that achieves high accuracy and efficiency, demonstrating competitiveness with large transformer models.
Findings
Achieves 92.1% accuracy on a large multilingual dataset.
Processes 2,813 names per second on a single CPU core.
Reduces energy consumption by a factor of 46 compared to transformer baselines.
Abstract
We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-separable operations and hierarchical classification to process names efficiently on CPU hardware. We evaluate the architecture on a large multilingual dataset covering 104 languages and four entity types (person, organization, location, other). Onomas-CNN X achieves 92.1% accuracy while processing 2,813 names per second on a single CPU core - 46 times faster than fine-tuned XLM-RoBERTa with comparable accuracy. The model reduces energy consumption by a factor of 46 compared to transformer baselines. Our experiments demonstrate that specialized CNN architectures remain competitive with large pre-trained models for focused NLP tasks when sufficient training data exists.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Authorship Attribution and Profiling · Topic Modeling
