Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
Hanjun Cho, Gahyun Yoo, Hanseong Kim, Jay-Yoon Lee

TL;DR
TaNOS is a self-supervised pretraining framework that enhances numerical reasoning transferability over tables by decoupling semantics and structure, leading to improved robustness and accuracy.
Contribution
It introduces operation sketches and header anonymization in a self-supervised setting to improve cross-domain numerical reasoning in instruction-tuned models.
Findings
Achieves 80.13% accuracy on FinQA with only 10% training data.
Nearly no cross-domain gap (<2pp) in transferability.
Outperforms baseline supervised fine-tuning and proprietary models.
Abstract
Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
