Do Convolutional Networks need to be Deep for Text Classification ?

Hoa T. Le; Christophe Cerisara; Alexandre Denis

arXiv:1707.04108·cs.CL·July 14, 2017·72 cites

Do Convolutional Networks need to be Deep for Text Classification ?

Hoa T. Le, Christophe Cerisara, Alexandre Denis

PDF

Open Access

TL;DR

This paper investigates the impact of depth in convolutional networks for text classification, finding that deep models excel with character inputs, while shallow models perform better with word inputs, achieving state-of-the-art results.

Contribution

It demonstrates that shallow and wide networks can outperform deep models in text classification with word inputs, challenging the assumption that depth is always beneficial.

Findings

01

Deep models outperform shallow ones with character inputs.

02

Shallow-and-wide networks outperform deep models with word inputs.

03

Achieved new state-of-the-art on Yelp datasets.

Abstract

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Average Pooling · Concatenated Skip Connection · Global Average Pooling · Dense Block · Kaiming Initialization · 1x1 Convolution · Dropout