Calibrating LLMs with Preference Optimization on Thought Trees for   Generating Rationale in Science Question Scoring

Jiazheng Li; Hainiu Xu; Zhaoyue Sun; Yuxiang Zhou; David West; Cesare; Aloisi; Yulan He

arXiv:2406.19949·cs.CL·October 15, 2024

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare, Aloisi, Yulan He

PDF

Open Access 1 Repo 10 Models 2 Datasets 1 Video

TL;DR

This paper introduces a novel framework that uses thought trees and preference optimization to generate faithful rationales for science question scoring, achieving performance comparable to black-box classifiers and improving assessment accuracy.

Contribution

It presents a new method combining thought trees and synthetic data for calibrating LLMs, enhancing rationale quality and scoring accuracy.

Findings

01

38% improvement in assessment performance (QWK score)

02

Higher-quality rationales verified by humans and LLMs

03

Effective use of synthetic preference data from thought trees

Abstract

Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with classifier-based black-box scoring systems. We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree. We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data. Finally, we utilise the generated synthetic data to calibrate LLMs through a two-step training process: supervised fine-tuning and preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijiazheng99/thought_tree_assessment
noneOfficial

Models

Datasets

Videos

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring· underline

Taxonomy

TopicsAdvanced Text Analysis Techniques · Educational Technology and Assessment