Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical   Classification

Shijing Chen; Mohamed Reda Bouadjenek; Shoaib Jameel; Usman Naseem,; Basem Suleiman; Flora D. Salim; Hakim Hacid; Imran Razzak

arXiv:2501.06827·cs.AI·January 14, 2025

Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

Shijing Chen, Mohamed Reda Bouadjenek, Shoaib Jameel, Usman Naseem,, Basem Suleiman, Flora D. Salim, Hakim Hacid, Imran Razzak

PDF

TL;DR

This paper introduces a novel framework that integrates taxonomy information with Large Language Models to improve hierarchical classification accuracy across multiple modalities, ensuring consistency within complex class structures.

Contribution

It presents a taxonomy-embedded, LLM-agnostic framework that enforces hierarchical consistency in multimodal classification tasks, a novel approach in MLHC.

Findings

01

Significant performance improvements on MEP-3M dataset.

02

Enhanced hierarchical consistency in predictions.

03

Framework is LLM-agnostic and adaptable.

Abstract

Multi-level Hierarchical Classification (MLHC) tackles the challenge of categorizing items within a complex, multi-layered class structure. However, traditional MLHC classifiers often rely on a backbone model with independent output layers, which tend to ignore the hierarchical relationships between classes. This oversight can lead to inconsistent predictions that violate the underlying taxonomy. Leveraging Large Language Models (LLMs), we propose a novel taxonomy-embedded transitional LLM-agnostic framework for multimodality classification. The cornerstone of this advancement is the ability of models to enforce consistency across hierarchical levels. Our evaluations on the MEP-3M dataset - a multi-modal e-commerce product dataset with various hierarchical levels - demonstrated a significant performance improvement compared to conventional LLM structures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.