Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets
Mihaela Gaman, Radu Tudor Ionescu

TL;DR
This paper explores various machine learning models, including deep learning and string kernels, for geolocating Swiss German tweets, finding that ensemble models combining different approaches yield the best results.
Contribution
It introduces a multi-perspective approach to dialect geolocation, combining handcrafted string kernel features with deep learning models for improved accuracy.
Findings
String kernel-based models outperform deep learning models alone.
Ensemble models combining handcrafted and deep learning features achieve the best performance.
Deep neural networks alone are less effective than traditional string kernel methods.
Abstract
In this work, we introduce the methods proposed by the UnibucKernel team in solving the Social Media Variety Geolocation task featured in the 2020 VarDial Evaluation Campaign. We address only the second subtask, which targets a data set composed of nearly 30 thousand Swiss German Jodels. The dialect identification task is about accurately predicting the latitude and longitude of test samples. We frame the task as a double regression problem, employing a variety of machine learning approaches to predict both latitude and longitude. From simple models for regression, such as Support Vector Regression, to deep neural networks, such as Long Short-Term Memory networks and character-level convolutional neural networks, and, finally, to ensemble models based on meta-learners, such as XGBoost, our interest is focused on approaching the problem from a few different perspectives, in an attempt to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Authorship Attribution and Profiling
MethodsLinear Layer · Layer Normalization · Dense Connections · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need
