Evaluation of Finetuned LLMs in AMR Parsing
Shu Han Ho

TL;DR
This paper evaluates the effectiveness of finetuning various decoder-only large language models for AMR parsing, showing that simple finetuning can achieve performance comparable to complex state-of-the-art methods.
Contribution
It provides a comprehensive comparison of four LLM architectures for AMR parsing, demonstrating that straightforward finetuning can rival complex models.
Findings
LLaMA 3.2 achieves competitive SMATCH F1 scores.
Finetuning LLMs can match state-of-the-art AMR parsers.
LLaMA 3.2 excels in semantic accuracy, Phi 3.5 in structural validity.
Abstract
AMR (Abstract Meaning Representation) is a semantic formalism that encodes sentence meaning as rooted, directed, acyclic graphs, where nodes represent concepts and edges denote semantic relations. Finetuning decoder only Large Language Models (LLMs) represent a promising novel straightfoward direction for AMR parsing. This paper presents a comprehensive evaluation of finetuning four distinct LLM architectures, Phi 3.5, Gemma 2, LLaMA 3.2, and DeepSeek R1 LLaMA Distilled using the LDC2020T02 Gold AMR3.0 test set. Our results have shown that straightfoward finetuning of decoder only LLMs can achieve comparable performance to complex State of the Art (SOTA) AMR parsers. Notably, LLaMA 3.2 demonstrates competitive performance against SOTA AMR parsers given a straightforward finetuning approach. We achieved SMATCH F1: 0.804 on the full LDC2020T02 test split, on par with APT + Silver (IBM) at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis
