A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Ana Begnini; Matheus Vicente; Leonardo Souza

arXiv:2603.09990·cs.CL·March 12, 2026

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Ana Begnini, Matheus Vicente, Leonardo Souza

PDF

Open Access

TL;DR

This paper introduces a two-stage AI architecture using LLMs and transformers to automate NDA document segmentation and clause classification, improving efficiency and accuracy in legal document analysis.

Contribution

It presents a novel two-model pipeline combining LLaMA-3.1 and Legal-Roberta-Large for NDA analysis, achieving high segmentation and classification performance.

Findings

01

Segmentation ROUGE F1 of 0.95

02

Clause classification weighted F1 of 0.85

03

Demonstrates effective automation of NDA analysis

Abstract

In business-to-business relations, it is common to establish NonDisclosure Agreements (NDAs). However, these documents exhibit significant variation in format, structure, and writing style, making manual analysis slow and error-prone. We propose an architecture based on LLMs to automate the segmentation and clauses classification within these contracts. We employed two models: LLaMA-3.1-8B-Instruct for NDA segmentation (clause extraction) and a fine-tuned Legal-Roberta-Large for clause classification. In the segmentation task, we achieved a ROUGE F1 of 0.95 +/- 0.0036; for classification, we obtained a weighted F1 of 0.85, demonstrating the feasibility and precision of the approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Artificial Intelligence in Law