Deep Contextual Embeddings for Address Classification in E-commerce
Shreyas Mangalgi, Lakshya Kumar, Ravindra Babu Tallamraju

TL;DR
This paper introduces a novel NLP-based approach using pre-trained language models, especially RoBERTa, to classify and understand unstructured e-commerce addresses in India, improving routing accuracy.
Contribution
It presents the first application of pre-trained language models like RoBERTa for address understanding in e-commerce, with effective fine-tuning for various downstream tasks.
Findings
RoBERTa achieves around 90% accuracy in sub-region classification.
Pre-trained models generalize well for downstream supply chain tasks.
Proposed approach outperforms traditional methods in address classification.
Abstract
E-commerce customers in developing nations like India tend to follow no fixed format while entering shipping addresses. Parsing such addresses is challenging because of a lack of inherent structure or hierarchy. It is imperative to understand the language of addresses, so that shipments can be routed without delays. In this paper, we propose a novel approach towards understanding customer addresses by deriving motivation from recent advances in Natural Language Processing (NLP). We also formulate different pre-processing steps for addresses using a combination of edit distance and phonetic algorithms. Then we approach the task of creating vector representations for addresses using Word2Vec with TF-IDF, Bi-LSTM and BERT based approaches. We compare these approaches with respect to sub-region classification task for North and South Indian cities. Through experiments, we demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Human Mobility and Location-Based Analysis
MethodsLinear Layer · Adam · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Attention Is All You Need · RoBERTa · Attention Dropout · Weight Decay
