RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild
Weiyao Wang, Byung-Hak Kim, Varun Ganapathi

TL;DR
RegCLR is a novel self-supervised learning framework for tabular and document image applications, combining contrastive and regularized methods to improve representation quality in real-world scenarios.
Contribution
Introduces RegCLR, a self-supervised framework integrating contrastive and regularized approaches compatible with Vision Transformers for tabular data.
Findings
Significant AP improvements in table and GUI object detection.
Effective in diverse real-world document image scenarios.
Enhances downstream performance over supervised baselines.
Abstract
Recent advances in self-supervised learning (SSL) using large models to learn visual representations from natural images are rapidly closing the gap between the results produced by fully supervised learning and those produced by SSL on downstream vision tasks. Inspired by this advancement and primarily motivated by the emergence of tabular and structured document image applications, we investigate which self-supervised pretraining objectives, architectures, and fine-tuning strategies are most effective. To address these questions, we introduce RegCLR, a new self-supervised framework that combines contrastive and regularized methods and is compatible with the standard Vision Transformer architecture. Then, RegCLR is instantiated by integrating masked autoencoders as a representative example of a contrastive method and enhanced Barlow Twins as a representative example of a regularized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization
