AutoML-guided Fusion of Entity and LLM-based Representations for Document Classification
Boshko Koloski, Senja Pollak, Roberto Navigli, Bla\v{z} \v{S}krlj

TL;DR
This paper presents an AutoML-guided method that fuses entity-based knowledge base embeddings with LLM representations to enhance document classification accuracy efficiently, enabling faster classifiers with minimal performance loss.
Contribution
It introduces a novel fusion of knowledge base embeddings with LLM representations, optimized via AutoML, for improved and faster document classification.
Findings
Fusion improves classification accuracy across datasets.
Low-dimensional projections retain performance with faster classifiers.
AutoML effectively optimizes the fused representation space.
Abstract
Large semantic knowledge bases are grounded in factual knowledge. However, recent approaches to dense text representations (i.e. embeddings) do not efficiently exploit these resources. Dense and robust representations of documents are essential for effectively solving downstream classification and retrieval tasks. This work demonstrates that injecting embedded information from knowledge bases can augment the performance of contemporary Large Language Model (LLM)-based representations for the task of text classification. Further, by considering automated machine learning (AutoML) with the fused representation space, we demonstrate it is possible to improve classification accuracy even if we use low-dimensional projections of the original representation space obtained via efficient matrix factorization. This result shows that significantly faster classifiers can be achieved with minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Advanced Data Processing Techniques
