Text classification in shipping industry using unsupervised models and Transformer based supervised models
Ying Xie, Dongping Song

TL;DR
This paper presents an unsupervised text classification method for the shipping industry that outperforms Transformer-based supervised models when labeled data is scarce, using word embeddings and cosine similarity.
Contribution
The study introduces a simple unsupervised classification approach using pretrained embeddings and similarity measures, demonstrating its effectiveness over supervised Transformer models in low-data scenarios.
Findings
Unsupervised model outperforms Transformer models with limited labeled data.
Increasing training data size does not significantly improve Transformer performance.
Unsupervised classification is a viable alternative when labeled data is scarce.
Abstract
Obtaining labelled data in a particular context could be expensive and time consuming. Although different algorithms, including unsupervised learning, semi-supervised learning, self-learning have been adopted, the performance of text classification varies with context. Given the lack of labelled dataset, we proposed a novel and simple unsupervised text classification model to classify cargo content in international shipping industry using the Standard International Trade Classification (SITC) codes. Our method stems from representing words using pretrained Glove Word Embeddings and finding the most likely label using Cosine Similarity. To compare unsupervised text classification model with supervised classification, we also applied several Transformer models to classify cargo content. Due to lack of training data, the SITC numerical codes and the corresponding textual descriptions were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Maritime Navigation and Safety
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Byte Pair Encoding · GloVe Embeddings · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer
