Study of the Proper NNUE Dataset
Daniel Tan, Neftali Watkinson Medina

TL;DR
This paper introduces a new algorithm for creating high-quality, stable datasets for training NNUE models in chess, leading to improved engine performance and providing a clear, replicable methodology.
Contribution
The paper presents a novel dataset generation and filtering algorithm specifically for NNUE training in chess, addressing a key challenge in dataset quality and stability.
Findings
Significant performance improvements in chess engines using the new dataset method.
The proposed approach effectively filters out tactical volatility, producing stable evaluation datasets.
Methodology is generalizable across different evaluation functions.
Abstract
NNUE (Efficiently Updatable Neural Networks) has revolutionized chess engine development, with nearly all top engines adopting NNUE models to maintain competitive performance. A key challenge in NNUE training is the creation of high-quality datasets, particularly in complex domains like chess, where tactical and strategic evaluations are essential. However, methods for constructing effective datasets remain poorly understood and under-documented. In this paper, we propose an algorithm for generating and filtering datasets composed of "quiet" positions that are stable and free from tactical volatility. Our approach provides a clear methodology for dataset creation, which can be replicated and generalized across various evaluation functions. Testing demonstrates significant improvements in engine performance, confirming the effectiveness of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications
