TabSim: A Siamese Neural Network for Accurate Estimation of Table Similarity
Maryam Habibi, Johannes Starlinger, Ulf Leser

TL;DR
TabSim is a neural network-based method that accurately measures the semantic similarity of tables by representing them through learned embeddings of their caption, content, and structure, outperforming existing measures.
Contribution
The paper introduces TabSim, a novel Siamese neural network approach for table similarity estimation, with a new dataset and improved accuracy over existing methods.
Findings
TabSim achieves approximately 7% higher F1-score in binary classification.
TabSim improves ranking performance by about 1.5%.
The method effectively captures semantic similarities in biomedical tables.
Abstract
Tables are a popular and efficient means of presenting structured information. They are used extensively in various kinds of documents including web pages. Tables display information as a two-dimensional matrix, the semantics of which is conveyed by a mixture of structure (rows, columns), headers, caption, and content. Recent research has started to consider tables as first class objects, not just as an addendum to texts, yielding interesting results for problems like table matching, table completion, or value imputation. All of these problems inherently rely on an accurate measure for the semantic similarity of two tables. We present TabSim, a novel method to compute table similarity scores using deep neural networks. Conceptually, TabSim represents a table as a learned concatenation of embeddings of its caption, its content, and its structure. Given two tables in this representation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
