Distilling System 2 into System 1

Ping Yu; Jing Xu; Jason Weston; Ilia Kulikov

arXiv:2407.06023·cs.CL·July 26, 2024·1 cites

Distilling System 2 into System 1

Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov

PDF

Open Access

TL;DR

This paper explores methods to distill the reasoning capabilities of System 2 techniques in large language models into more efficient System 1 responses, improving performance while reducing inference costs.

Contribution

It introduces self-supervised distillation methods to embed System 2 reasoning into System 1 outputs, enhancing efficiency and performance.

Findings

01

Distillation improves System 1 performance over baseline.

02

Reduced inference cost compared to System 2 techniques.

03

Effective self-supervised methods for reasoning distillation.

Abstract

Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought (Wei et al., 2022), many such System 2 techniques have been proposed such as Rephrase and Respond (Deng et al., 2023a), System 2 Attention (Weston and Sukhbaatar, 2023) and Branch-Solve-Merge (Saha et al., 2023). In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations without intermediate reasoning token sequences, as this reasoning has been distilled into System 1. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that such System 2 distillation will be an important feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Focus