TL;DR
This paper introduces a new multilingual email zoning benchmark and OKAPI, a language-agnostic segmentation model that generalizes across languages and improves domain adaptation in email NLP tasks.
Contribution
It presents the first multilingual email zoning benchmark and a novel language-agnostic segmentation model, advancing email NLP across diverse languages.
Findings
OKAPI performs well on unseen languages
Achieves state-of-the-art in English domain adaptation
Benchmark includes Portuguese, Spanish, and French emails
Abstract
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
