A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification
Ana Begnini, Matheus Vicente, Leonardo Souza

TL;DR
This paper introduces a two-stage AI architecture using LLMs and transformers to automate NDA document segmentation and clause classification, improving efficiency and accuracy in legal document analysis.
Contribution
It presents a novel two-model pipeline combining LLaMA-3.1 and Legal-Roberta-Large for NDA analysis, achieving high segmentation and classification performance.
Findings
Segmentation ROUGE F1 of 0.95
Clause classification weighted F1 of 0.85
Demonstrates effective automation of NDA analysis
Abstract
In business-to-business relations, it is common to establish NonDisclosure Agreements (NDAs). However, these documents exhibit significant variation in format, structure, and writing style, making manual analysis slow and error-prone. We propose an architecture based on LLMs to automate the segmentation and clauses classification within these contracts. We employed two models: LLaMA-3.1-8B-Instruct for NDA segmentation (clause extraction) and a fine-tuned Legal-Roberta-Large for clause classification. In the segmentation task, we achieved a ROUGE F1 of 0.95 +/- 0.0036; for classification, we obtained a weighted F1 of 0.85, demonstrating the feasibility and precision of the approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Artificial Intelligence in Law
