History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

Andrew Kiruluta; Andreas Lemos; and Priscilla Burity

arXiv:2506.11108·cs.CL·June 16, 2025

History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

Andrew Kiruluta, Andreas Lemos, and Priscilla Burity

PDF

Open Access

TL;DR

This paper introduces CAGSR-vLLM-MTC, a framework that enhances large language models with self-supervised reinforcement learning for multi-turn dialogues and reasoning, leveraging attention signals during generation.

Contribution

It extends the CAGSR framework to vLLM, enabling asynchronous attention capture and self-supervised training for complex multi-turn and chain-of-thought tasks.

Findings

01

Effective attention signal accumulation over conversations

02

Improved multi-turn dialogue reasoning capabilities

03

Practical mechanisms to prevent attention collapse

Abstract

We present CAGSR-vLLM-MTC, an extension of our Self-Supervised Cross-Attention-Guided Reinforcement (CAGSR) framework, now implemented on the high-performance vLLM runtime, to address both multi-turn dialogue and chain-of-thought reasoning. Building upon our original single-turn approach, we first instrumented vLLM's C++/CUDA kernels to asynchronously capture per-layer, per-head cross-attention weights during generation. We then generalized our self-supervised reward function to accumulate attention signals over entire conversation histories and intermediate chain-of-thought steps. We discuss practical trade-offs, including an entropy-based clamping mechanism to prevent attention collapse on early context, and outline future directions for multi-party dialogues and hierarchical reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Advanced Memory and Neural Computing · EEG and Brain-Computer Interfaces