Clustering-based aggregate value regression

Kei Hirose; Hidetoshi Matsui; Hiroki Masuda

arXiv:2508.15567·stat.ME·August 22, 2025

Clustering-based aggregate value regression

Kei Hirose, Hidetoshi Matsui, Hiroki Masuda

PDF

Open Access

TL;DR

This paper proposes a novel aggregate value regression method that combines clustering with linear regression to improve forecasting of total values, addressing overparameterization and bias-variance trade-offs.

Contribution

It introduces AVR-C, a hierarchical clustering-based approach for aggregate value regression, and develops a bias-variance trade-off theory under model misspecification.

Findings

01

Demonstrates the effectiveness of AVR-C through Monte Carlo simulations.

02

Shows how the number of clusters affects forecast accuracy.

03

Validates the approach with electricity demand forecasting data.

Abstract

In various practical situations, forecasting of aggregate values rather than individual ones is often our main focus. For instance, electricity companies are interested in forecasting the total electricity demand in a specific region to ensure reliable grid operation and resource allocation. However, to our knowledge, statistical learning specifically for forecasting aggregate values has not yet been well-established. In particular, the relationship between forecast error and the number of clusters has not been well studied, as clustering is usually treated as unsupervised learning. This study introduces a novel forecasting method specifically focused on the aggregate values in the linear regression model. We call it the Aggregate Value Regression (AVR), and it is constructed by combining all regression models into a single model. With the AVR, we must estimate a huge number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy Load and Power Forecasting · Advanced Clustering Algorithms Research · Customer churn and segmentation