# Differentiable Disentanglement Filter: an Application Agnostic Core   Concept Discovery Probe

**Authors:** Guntis Barzdins, Eduards Sidorovics

arXiv: 1907.07507 · 2019-07-25

## TL;DR

This paper introduces a novel neural network nonlinearity called Differentiable Disentanglement Filter (DDF) that can be integrated into existing networks to automatically disentangle core concepts, aiding interpretability and understanding of neural representations.

## Contribution

The paper proposes the DDF, a new nonlinearity that facilitates disentangling core concepts in neural networks, inspired by hyper-dimensional computing theory, applicable across various models.

## Key findings

- DDF can be inserted into neural networks to disentangle core concepts.
- DDF successfully disentangles concepts in 3D scene representations.
- The approach enhances interpretability of neural network layers.

## Abstract

It has long been speculated that deep neural networks function by discovering a hierarchical set of domain-specific core concepts or patterns, which are further combined to recognize even more elaborate concepts for the classification or other machine learning tasks. Meanwhile disentangling the actual core concepts engrained in the word embeddings (like word2vec or BERT) or deep convolutional image recognition neural networks (like PG-GAN) is difficult and some success there has been achieved only recently. In this paper we propose a novel neural network nonlinearity named Differentiable Disentanglement Filter (DDF) which can be transparently inserted into any existing neural network layer to automatically disentangle the core concepts used by that layer. The DDF probe is inspired by the obscure properties of the hyper-dimensional computing theory. The DDF proof-of-concept implementation is shown to disentangle concepts within the neural 3D scene representation - a task vital for visual grounding of natural language narratives.

---
Source: https://tomesphere.com/paper/1907.07507