A Graph-based Approach for Multi-Modal Question Answering from Flowcharts in Telecom Documents
Sumit Soman, H. G. Ranjani, Sujoy Roychowdhury, Venkata Dharma Surya Narayana Sastry, Akshat Jain, Pranav Gangrade, Ayaaz Khan

TL;DR
This paper introduces a graph-based method leveraging visual language models to improve question-answering accuracy from flowcharts in telecom documents, integrating image and text data for better retrieval.
Contribution
It presents an end-to-end system that combines graph representations of flowcharts with text-based retrieval, reducing reliance on costly visual models during inference.
Findings
Graph representations have lower edit distance to ground truth.
The approach improves retrieval performance in telecom QA.
Cost-effective by reducing VLM inference dependency.
Abstract
Question-Answering (QA) from technical documents often involves questions whose answers are present in figures, such as flowcharts or flow diagrams. Text-based Retrieval Augmented Generation (RAG) systems may fail to answer such questions. We leverage graph representations of flowcharts obtained from Visual large Language Models (VLMs) and incorporate them in a text-based RAG system to show that this approach can enable image retrieval for QA in the telecom domain. We present the end-to-end approach from processing technical documents, classifying image types, building graph representations, and incorporating them with the text embedding pipeline for efficient retrieval. We benchmark the same on a QA dataset created based on proprietary telecom product information documents. Results show that the graph representations obtained using a fine-tuned VLM model have lower edit distance with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
