Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery

Anand Krishnakumar; Vengadesh Ravikumaran

arXiv:2511.06973·cs.LG·November 12, 2025

Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery

Anand Krishnakumar, Vengadesh Ravikumaran

PDF

Open Access

TL;DR

This paper introduces a hybrid similarity measure for spreadsheets that combines semantic, data type, and spatial information, enabling more accurate template discovery and clustering.

Contribution

A novel hybrid distance metric for spreadsheets that improves template clustering by integrating semantic embeddings, data types, and spatial layouts.

Findings

01

Achieved perfect template reconstruction with an Adjusted Rand Index of 1.00.

02

Outperformed the graph-based Mondrian baseline in clustering tasks.

03

Enabled large-scale automated template discovery for various applications.

Abstract

Traditional methods for identifying structurally similar spreadsheets fail to capture the spatial layouts and type patterns defining templates. To quantify spreadsheet similarity, we introduce a hybrid distance metric that combines semantic embeddings, data type information, and spatial positioning. In order to calculate spreadsheet similarity, our method converts spreadsheets into cell-level embeddings and then uses aggregation techniques like Chamfer and Hausdorff distances. Experiments across template families demonstrate superior unsupervised clustering performance compared to the graph-based Mondrian baseline, achieving perfect template reconstruction (Adjusted Rand Index of 1.00 versus 0.90) on the FUSTE dataset. Our approach facilitates large-scale automated template discovery, which in turn enables downstream applications such as retrieval-augmented generation over tabular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Spreadsheets and End-User Computing · Data Quality and Management