Out-of-Manifold Regularization in Contextual Embedding Space for Text   Classification

Seonghyeon Lee; Dongha Lee; Hwanjo Yu

arXiv:2105.06750·cs.CL·May 17, 2021

Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification

Seonghyeon Lee, Dongha Lee, Hwanjo Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel regularization method for text classification that leverages out-of-manifold embeddings, generated and discriminated through a joint model, improving model robustness and performance.

Contribution

It proposes a new out-of-manifold regularization technique using a generator-discriminator framework to enhance neural network training in text classification.

Findings

01

Improves classification accuracy on multiple benchmarks.

02

Compatible with existing data augmentation methods.

03

Effectively regularizes the embedding space outside the manifold.

Abstract

Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sh0416/oommix
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Handwritten Text Recognition Techniques