# Does a Neural Network Really Encode Symbolic Concepts?

**Authors:** Mingjie Li, Quanshi Zhang

arXiv: 2302.13080 · 2024-09-16

## TL;DR

This paper investigates whether neural networks genuinely encode meaningful symbolic concepts by examining the trustworthiness of interaction-based concepts through empirical analysis.

## Contribution

It provides a comprehensive evaluation of the nature of interaction concepts in DNNs, highlighting their sparsity, transferability, and discriminative power.

## Key findings

- DNNs encode sparse concepts
- Concepts are transferable across tasks
- Encoded concepts are discriminative

## Abstract

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13080/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13080/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/2302.13080/full.md

---
Source: https://tomesphere.com/paper/2302.13080