Mutual information estimation for graph convolutional neural networks

Marius C. Landverk; Signe Riemer-S{\o}rensen

arXiv:2203.16887·cs.LG·April 1, 2022

Mutual information estimation for graph convolutional neural networks

Marius C. Landverk, Signe Riemer-S{\o}rensen

PDF

TL;DR

This paper introduces an architecture-agnostic method to estimate mutual information in neural networks, specifically applied to graph convolutional networks, to analyze internal representations and information flow during training.

Contribution

It presents a novel, architecture-agnostic approach for tracking mutual information in neural networks, with a focus on graph-based models, enhancing understanding of information flow.

Findings

01

Mutual information varies with network architecture and training.

02

Graph-based neural networks show distinct information flow patterns.

03

The method provides insights into how inductive biases affect internal representations.

Abstract

Measuring model performance is a key issue for deep learning practitioners. However, we often lack the ability to explain why a specific architecture attains superior predictive accuracy for a given data set. Often, validation accuracy is used as a performance heuristic quantifying how well a network generalizes to unseen data, but it does not capture anything about the information flow in the model. Mutual information can be used as a measure of the quality of internal representations in deep learning models, and the information plane may provide insights into whether the model exploits the available information in the data. The information plane has previously been explored for fully connected neural networks and convolutional architectures. We present an architecture-agnostic method for tracking a network's internal representations during training, which are then used to create the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.