SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features
Mohammad Zare

TL;DR
SemImage transforms text into a 2D semantic image with disentangled linguistic features, enabling CNNs to classify documents effectively while providing interpretability of topic and sentiment shifts.
Contribution
We introduce SemImage, a novel 2D image representation of text that encodes linguistic features in a disentangled HSV space and highlights semantic boundaries for improved interpretability.
Findings
Achieves competitive accuracy with BERT and hierarchical models.
Enhances interpretability through visual boundary detection.
Demonstrates effective encoding of topic and sentiment features.
Abstract
We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is represented as a pixel in a 2D image: rows correspond to sentences and an additional boundary row is inserted between sentences to mark semantic transitions. Each pixel is not a typical RGB value but a vector in a disentangled HSV color space, encoding different linguistic features: the Hue with two components H_cos and H_sin to account for circularity encodes the topic, Saturation encodes the sentiment, and Value encodes intensity or certainty. We enforce this disentanglement via a multi-task learning framework: a ColorMapper network maps each word embedding to the HSV space, and auxiliary supervision is applied to the Hue and Saturation channels to predict topic and sentiment labels, alongside the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Text and Document Classification Technologies
