LLMs to Support a Domain Specific Knowledge Assistant

Maria-Flavia Lovin

arXiv:2502.04095·cs.CL·February 7, 2025

LLMs to Support a Domain Specific Knowledge Assistant

Maria-Flavia Lovin

PDF

Open Access

TL;DR

This paper develops a domain-specific knowledge assistant for IFRS sustainability reporting by creating a synthetic QA dataset using LLMs and designing two question-answering architectures, achieving high accuracy and domain relevance.

Contribution

It introduces a novel generation and evaluation pipeline for creating a high-quality IFRS QA dataset and develops two effective QA architectures tailored for sustainability reporting.

Findings

01

QA dataset averages 8.16/10 on quality metrics

02

RAG pipeline achieves 85.32% accuracy on single-industry questions

03

LLM-based pipeline achieves 93.45% accuracy, outperforming baselines

Abstract

This work presents a custom approach to developing a domain specific knowledge assistant for sustainability reporting using the International Financial Reporting Standards (IFRS). In this domain, there is no publicly available question-answer dataset, which has impeded the development of a high-quality chatbot to support companies with IFRS reporting. The two key contributions of this project therefore are: (1) A high-quality synthetic question-answer (QA) dataset based on IFRS sustainability standards, created using a novel generation and evaluation pipeline leveraging Large Language Models (LLMs). This comprises 1,063 diverse QA pairs that address a wide spectrum of potential user queries in sustainability reporting. Various LLM-based techniques are employed to create the dataset, including chain-of-thought reasoning and few-shot prompting. A custom evaluation framework is developed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Byte Pair Encoding · WordPiece · Layer Normalization · Residual Connection · Dense Connections