Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

Armin Berger; Lars Hillebrand; David Leonhard; Tobias Deu{\ss}er; Thiago Bell Felix de Oliveira; Tim Dilmaghani; Mohamed Khaled; Bernd Kliem; R\"udiger Loitz; Christian Bauckhage; Rafet Sifa

arXiv:2507.16642·cs.CL·July 23, 2025

Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

Armin Berger, Lars Hillebrand, David Leonhard, Tobias Deu{\ss}er, Thiago Bell Felix de Oliveira, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, R\"udiger Loitz, Christian Bauckhage, Rafet Sifa

PDF

TL;DR

This study evaluates the effectiveness of various large language models, including open-source and proprietary ones, in automating regulatory compliance verification in financial auditing, highlighting their strengths and limitations across different scenarios.

Contribution

It provides a comparative analysis of open-source and proprietary LLMs for compliance verification, introducing new datasets and insights into their performance in financial auditing tasks.

Findings

01

Llama-2 70B excels at detecting non-compliance.

02

GPT-4 performs best across diverse scenarios.

03

Open-source models show promise in specific compliance tasks.

Abstract

The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI's GPT models. This comparative analysis leverages two custom datasets provided by our partner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.