BertGCN: Transductive Text Classification by Combining GCN and BERT
Yuxiao Lin, Yuxian Meng, Xiaofei Sun, Qinghong Han, Kun Kuang, Jiwei, Li, Fei Wu

TL;DR
BertGCN is a novel model that combines BERT's large-scale pretraining with graph convolutional networks for transductive text classification, effectively leveraging both raw data and test data during training.
Contribution
It introduces a method that integrates BERT and GCN in a transductive learning framework for improved text classification performance.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively propagates label information through graph convolution.
Combines pretraining and transductive learning advantages.
Abstract
In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification. BertGCN constructs a heterogeneous graph over the dataset and represents documents as nodes using BERT representations. By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning which jointly learns representations for both training data and unlabeled test data by propagating label influence through graph convolution. Experiments show that BertGCN achieves SOTA performances on a wide range of text classification datasets. Code is available at https://github.com/ZeroRin/BertGCN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Sentiment Analysis and Opinion Mining
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Linear Warmup With Linear Decay · Softmax · Multi-Head Attention · Residual Connection · WordPiece · Weight Decay
