An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT
Shreya Singh

TL;DR
This paper introduces an enhanced large language model framework that integrates deep learning and knowledge graph techniques to improve cross-modal query understanding, achieving high accuracy in image captioning tasks.
Contribution
The paper presents a novel system combining DL-KeyBERT, CAZSSCL-MPGPT, and knowledge graph methods to address redundancy issues and improve cross-modal understanding.
Findings
Achieved 99.14% accuracy on COCO 2017 dataset.
Achieved 98.43% accuracy on VQAv2 validation dataset.
Effectively reduces the echo chamber effect in cross-modal models.
Abstract
Large Language Models (LLMs) are advanced deep-learning models designed to understand and generate human language. They work together with models that process data like images, enabling cross-modal understanding. However, existing approaches often suffer from the echo chamber effect, where redundant visual patterns reduce model generalization and accuracy. Thus, the proposed system considered this limitation and developed an enhanced LLM-based framework for cross-modal query understanding using DL-KeyBERT-based CAZSSCL-MPGPT. The collected dataset consists of pre-processed images and texts. The preprocessed images then undergo object segmentation using Easom-You Only Look Once (E-YOLO). The object skeleton is generated, along with the knowledge graph using a Conditional Random Knowledge Graph (CRKG) technique. Further, features are extracted from the knowledge graph, generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
