# Distinguishing mirror from glass: A 'big data' approach to material   perception

**Authors:** Hideki Tamura, Konrad E. Prokott, Roland W. Fleming

arXiv: 1903.01671 · 2022-03-14

## TL;DR

This study uses a large dataset and neural network models to investigate how humans distinguish mirror from glass materials, revealing that shallow networks align more closely with human judgments but still fall short of human consistency.

## Contribution

The paper demonstrates that shallow neural networks better predict human material perception in a challenging task, but highlights limitations in current models' ability to fully replicate human judgments.

## Key findings

- Shallow networks outperform deeper ones in predicting human judgments.
- No neural network model exceeds 0.6 correlation with human performance.
- Models do not fully replicate the high inter-human consistency in material perception.

## Abstract

Visually identifying materials is crucial for many tasks, yet material perception remains poorly understood. Distinguishing mirror from glass is particularly challenging as both materials derive their appearance from their surroundings, yet we rarely experience difficulties telling them apart. Here we took a 'big data' approach to uncovering the underlying visual cues and processes, leveraging recent advances in neural network models of vision. We trained thousands of convolutional neural networks on >750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on 'hand-engineered' image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. To tease the models apart, we then painstakingly assembled a diagnostic image set for which humans make highly systematic errors, allowing us to decouple accuracy from human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow networks predicted human judgments better than any other models. However, surprisingly, no network correlated better than 0.6 with humans (below inter-human correlations). Thus, although the model sets new standards for simulating human vision in a challenging material perception task, the results cast doubt on recent claims that such architectures are generally good models of human vision.

---
Source: https://tomesphere.com/paper/1903.01671