An Enhanced Large Language Model For Cross Modal Query Understanding   System Using DL-KeyBERT Based CAZSSCL-MPGPT

Shreya Singh

arXiv:2502.17000·cs.CV·February 25, 2025

An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT

Shreya Singh

PDF

Open Access

TL;DR

This paper introduces an enhanced large language model framework that integrates deep learning and knowledge graph techniques to improve cross-modal query understanding, achieving high accuracy in image captioning tasks.

Contribution

The paper presents a novel system combining DL-KeyBERT, CAZSSCL-MPGPT, and knowledge graph methods to address redundancy issues and improve cross-modal understanding.

Findings

01

Achieved 99.14% accuracy on COCO 2017 dataset.

02

Achieved 98.43% accuracy on VQAv2 validation dataset.

03

Effectively reduces the echo chamber effect in cross-modal models.

Abstract

Large Language Models (LLMs) are advanced deep-learning models designed to understand and generate human language. They work together with models that process data like images, enabling cross-modal understanding. However, existing approaches often suffer from the echo chamber effect, where redundant visual patterns reduce model generalization and accuracy. Thus, the proposed system considered this limitation and developed an enhanced LLM-based framework for cross-modal query understanding using DL-KeyBERT-based CAZSSCL-MPGPT. The collected dataset consists of pre-processed images and texts. The preprocessed images then undergo object segmentation using Easom-You Only Look Once (E-YOLO). The object skeleton is generated, along with the knowledge graph using a Conditional Random Knowledge Graph (CRKG) technique. Further, features are extracted from the knowledge graph, generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling