A practical generalization metric for deep networks benchmarking

Mengqing Huang; Hongchuan Yu; Jianjun Zhang

PMC · DOI:10.1038/s41598-025-93005-5·March 21, 2025

A practical generalization metric for deep networks benchmarking

Mengqing Huang, Hongchuan Yu, Jianjun Zhang

PDF

Open Access

TL;DR

This paper introduces a practical metric to evaluate how well deep learning models generalize, revealing a gap between theory and practice.

Contribution

A novel generalization metric and benchmark testbed for evaluating deep networks' generalization capacity.

Findings

01

Generalization in deep networks depends on classification accuracy and unseen data diversity.

02

Most theoretical generalization estimates do not align with practical measurements.

03

The proposed metric provides an intuitive trade-off between accuracy and data diversity.

Abstract

There is an ongoing and dedicated effort to estimate bounds on the generalization error of deep learning models, coupled with an increasing interest with practical metrics that can be used to experimentally evaluate a model’s ability to generalize. This interest is not only driven by practical considerations but is also vital for theoretical research, as theoretical estimations require practical validation. However, there is currently a lack of research on benchmarking the generalization capacity of various deep networks and verifying these theoretical estimations. This paper aims to introduce a practical generalization metric for benchmarking different deep networks and proposes a novel testbed for the verification of theoretical estimations. Our findings indicate that a deep network’s generalization capacity in classification tasks is contingent upon both classification accuracy and…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

CIFAR-100

Diseases1

SSIM

Figures5

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning in Materials Science