SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation

Rafiq Kamel; Filippo Guerranti; Simon Geisler; Stephan G\"unnemann

arXiv:2507.13381·cs.CL·December 11, 2025

SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation

Rafiq Kamel, Filippo Guerranti, Simon Geisler, Stephan G\"unnemann

PDF

Open Access 3 Reviews

TL;DR

SAFT is a novel fine-tuning method that incorporates graph structure into large language models for improved AMR-to-text generation, achieving state-of-the-art results without changing the model architecture.

Contribution

Introduces SAFT, a structure-aware fine-tuning technique that injects graph topology into pretrained LLMs using magnetic Laplacian-based positional encodings, enhancing structured data processing.

Findings

01

Sets new state-of-the-art on AMR 3.0 with 3.5 BLEU improvement

02

Gains increase with graph complexity

03

Effective without architectural modifications

Abstract

Large Language Models (LLMs) are increasingly applied to tasks involving structured inputs such as graphs. Abstract Meaning Representations (AMRs), which encode rich semantics as directed graphs, offer a rigorous testbed for evaluating LLMs on text generation from such structures. Yet, current methods often arbitrarily linearize AMRs, discarding key structural cues, or rely on architectures incompatible with standard LLMs. We introduce SAFT, a structure-aware fine-tuning approach that injects graph topology into pretrained LLMs without architectural changes. We compute direction-sensitive positional encodings from the magnetic Laplacian of transformed AMRs and project them into the embedding space of the LLM. While possibly applicable to any graph-structured inputs, we focus on AMR-to-text generation as a representative and challenging benchmark. SAFT sets a new state-of-the-art on AMR…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- Beyond AMRs, the proposed graph structure-aware positional embeddings could be useful contribution for the community to improve LLMs tasks that require long context processing.

Weaknesses

- Experiments show improvements when fine-tuning with the proposed graph positional embeddings. However, from these experiments it is not clear how would these results generalize to larger scale LLMs (experiments where done in LLMs up to 3B size). - The authors miss some important related work on graph encoding which deals in particular with the issue of encoding edge labels and applies semantic preserving transformation/reification of edges as nodes. These should added into the related work di

Reviewer 02Rating 2Confidence 5

Strengths

The paper proposes a novel graph structure encoding method that effectively encodes the structural information of AMR graphs. Furthermore, this method appears to be compatible with any Decoder-only LLM and holds potential for adaptation to other graph structures.

Weaknesses

1. **Lack of Important Baselines:** Prompt-based methods are also model-agnostic and, crucially, do not require additional training, potentially offering stronger generalization. The authors do not compare their method against such approaches in the main text. Although a comparison with a GPT-4o-mini zero-shot method is included in Appendix C.4, this is insufficient because: (a) The comparison is unfair: a comparison with *few-shot* methods is more appropriate, given that the proposed SAFT in

Reviewer 03Rating 6Confidence 4

Strengths

The paper is generally clearly presented and well motivated. The magnetic Laplacian idea is elegant, and this reviewer learned something new from the paper. The method is easy-to-follow and uses graph theory tools like magnetic Laplacian. The math formulae in the paper are helpful. (Though the reviewer did need to spend a few minutes chatting with a language model to get some intuition about the magnetic Laplacian; the paper could do more to help readers here.) The experimental results are g

Weaknesses

There is no discussion of the computational cost of the approach in the main paper, either in absolute terms (complexity of calculating the position embeddings – though this is in the appendix, increase in input token count, etc.) or relative to the baseline (at finetuning time, at inference time – this comparison is not even in the appendix). Readers will want to know what they have to pay to get the quality improvements, both up front to finetune and at inference time. I think some of this d

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications