Hindsight Preference Optimization for Financial Time Series Advisory

Yanwei Cui; Guanghui Wang; Xing Zhang; Peiyang He; Ziyuan Li; Bing Zhu; Wei Qiu; Xusheng Wang; Zheng Yu; Anqi Xin

arXiv:2604.23988·cs.LG·April 28, 2026

Hindsight Preference Optimization for Financial Time Series Advisory

Yanwei Cui, Guanghui Wang, Xing Zhang, Peiyang He, Ziyuan Li, Bing Zhu, Wei Qiu, Xusheng Wang, Zheng Yu, Anqi Xin

PDF

TL;DR

This paper introduces Hindsight Preference Optimization, a method using retrospective outcomes to train language models for financial advisory, improving accuracy and quality in stock market predictions.

Contribution

It proposes a novel reinforcement learning approach that leverages outcome-based feedback to enhance financial advisory models without human annotations.

Findings

01

A 4B model outperforms a 235B teacher in accuracy.

02

The method improves advisory quality and decision-making.

03

Retrospective outcome evaluation enhances model training.

Abstract

Time series models predict numbers; decision-makers need advisory -- directional signals with reasoning, actionable suggestions, and risk management. Training language models for such predictive advisory faces a fundamental challenge: quality depends on outcomes unknown at prediction time. We bridge two ideas from reinforcement learning -- using information unavailable during execution to retrospectively generate training signal, and preference alignment -- and propose Hindsight Preference Optimization: observed outcomes let an LLM judge rank candidate advisories on dimensions that scalar metrics cannot capture, producing preference pairs for DPO without human annotation. We apply this to Vision-Language-Model-based predictive advisories on S&P 500 equity time series, demonstrated by a 4B model outperforming its 235B teacher on both accuracy and advisory quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.