Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

Tianhao Fu; Xinxin Xu; Weichen Xu; Jue Chen; Ruilong Ren; Bowen Deng; Xinyu Zhao; Jian Cao; Xixin Cao

arXiv:2511.07110·cs.AI·January 30, 2026

Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

Tianhao Fu, Xinxin Xu, Weichen Xu, Jue Chen, Ruilong Ren, Bowen Deng, Xinyu Zhao, Jian Cao, Xixin Cao

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework called Cooperative Market Making (CMM) that distills large language model features into smaller models for financial market making, improving efficiency and performance.

Contribution

The paper proposes a new decoupling and collaborative distillation method for LLM features, including a Hákek-MoE integration, tailored for financial market making tasks.

Findings

01

CMM outperforms existing distillation methods in experiments.

02

CMM achieves better market-making performance than RL-based strategies.

03

Decoupling features enhances the interpretability and efficiency of small models.

Abstract

Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM's feature. Based on the observation found by our investigation, we propose Cooperative Market Making (CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Two Heads Are Better than One: Distilling Large Language Model Features into Small Models with Feature Decomposition and Mixture· underline

Taxonomy

TopicsStock Market Forecasting Methods · Complex Systems and Time Series Analysis · Explainable Artificial Intelligence (XAI)