Mutual information and the encoding of contingency tables

Maximilian Jerdee; Alec Kirkley; M. E. J. Newman

arXiv:2405.05393·cs.SI·July 17, 2025

Mutual information and the encoding of contingency tables

Maximilian Jerdee, Alec Kirkley, M. E. J. Newman

PDF

Open Access 1 Repo

TL;DR

This paper improves the encoding of contingency tables to reduce bias in mutual information calculations, providing more accurate similarity measures for labelings in classification and community detection tasks.

Contribution

It introduces an enhanced method for encoding contingency tables that yields better bounds on the reduced mutual information, especially when labelings are similar.

Findings

01

The new encoding method provides a substantially better bound in typical cases.

02

Approaches the ideal value when labelings are closely similar.

03

Demonstrated effectiveness through extensive numerical experiments.

Abstract

Mutual information is commonly used as a measure of similarity between competing labelings of a given set of objects, for example to quantify performance in classification and community detection tasks. As argued recently, however, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table, a crucial component of the similarity calculation. In principle the bias can be rectified by subtracting the appropriate information cost, leading to the modified measure known as the reduced mutual information, but in practice one can only ever compute an upper bound on this information cost, and the value of the reduced mutual information depends crucially on how good a bound is established. In this paper we describe an improved method for encoding contingency tables that gives a substantially better bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maxjerdee/reduced_mutual_information
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms