Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Xingwu Chen, Miao Lu, Beining Wu, Difan Zou

TL;DR
This paper develops a theoretical framework for understanding how test-time computation techniques like sampling and noise injection improve language model inference, focusing on in-context linear regression as a case study.
Contribution
It introduces a novel theoretical approach incorporating randomness and sampling to analyze transformer inference behaviors, bridging practical methods and theoretical understanding.
Findings
Analysis of inference techniques through noise injection and sampling.
Empirical results supporting the theoretical framework.
Insights into inference behaviors in real-world language models.
Abstract
Using more test-time computation during language model inference, such as generating more intermediate thoughts or sampling multiple candidate answers, has proven effective in significantly improving model performance. This paper takes an initial step toward bridging the gap between practical language model inference and theoretical transformer analysis by incorporating randomness and sampling. We focus on in-context linear regression with continuous/binary coefficients, where our framework simulates language model decoding through noise injection and binary coefficient sampling. Through this framework, we provide detailed analyses of widely adopted inference techniques. Supported by empirical results, our theoretical framework and analysis demonstrate the potential for offering new insights into understanding inference behaviors in real-world language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
