LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models

Tiesunlong Shen; Rui Mao; Jin Wang; Heming Sun; Jian Zhang; Xuejie Zhang; Erik Cambria

arXiv:2601.10416·cs.AI·January 16, 2026

LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models

Tiesunlong Shen, Rui Mao, Jin Wang, Heming Sun, Jian Zhang, Xuejie Zhang, Erik Cambria

PDF

Open Access

TL;DR

LLMdoctor introduces a token-level, flow-guided preference optimization framework for efficient, diverse, and precise test-time alignment of large language models, outperforming existing methods and full fine-tuning.

Contribution

The paper proposes a novel token-level, flow-guided preference optimization approach for test-time alignment, enabling efficient and diverse alignment without full fine-tuning.

Findings

01

Outperforms existing test-time alignment methods.

02

Surpasses full fine-tuning approaches like DPO.

03

Preserves generative diversity of the base model.

Abstract

Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a promising alternative, existing approaches often rely on distorted trajectory-level signals or inefficient sampling, fundamentally capping performance and failing to preserve the generative diversity of the base model. This paper introduces LLMdoctor, a novel framework for efficient test-time alignment that operates via a patient-doctor paradigm. It integrates token-level reward acquisition with token-level flow-guided preference optimization (TFPO) to steer a large, frozen patient LLM with a smaller, specialized doctor model. Unlike conventional methods that rely on trajectory-level rewards, LLMdoctor first extracts fine-grained, token-level preference signals from the patient model's behavioral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Multimodal Machine Learning Applications · Topic Modeling