Cross-functional Analysis of Generalisation in Behavioural Learning
Pedro Henrique Luz de Araujo, Benjamin Roth

TL;DR
This paper introduces BeLUGA, a method for analyzing how well models generalize in behavioural learning across different functionalities, highlighting the risks of overfitting and the effects of various regularization techniques.
Contribution
The paper presents BeLUGA, a novel analysis framework for evaluating behavioural learning and generalization across multiple dimensions and phenomena in NLP tasks.
Findings
BeLUGA effectively measures generalization to unseen functionalities.
Regularisation methods influence the generalization performance.
Models can overfit behavioural test suites, misrepresenting robustness.
Abstract
In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsTest
