# Understanding Deep Learning Performance through an Examination of Test   Set Difficulty: A Psychometric Case Study

**Authors:** John P. Lalor, Hao Wu, Tsendsuren Munkhdalai, Hong Yu

arXiv: 1702.04811 · 2018-09-11

## TL;DR

This paper investigates how the difficulty of test questions affects deep learning model performance, using psychometric methods to model question difficulty and analyzing its impact on model learning and accuracy.

## Contribution

It introduces a psychometric approach to quantify test question difficulty and examines its influence on deep learning models' learning dynamics and performance.

## Key findings

- Question difficulty impacts the likelihood of correct answers.
- Models learn easy examples faster than hard ones as training progresses.
- Difficulty correlates with model performance across tasks.

## Abstract

Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.04811/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1702.04811/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1702.04811/full.md

---
Source: https://tomesphere.com/paper/1702.04811