Fighting Redundancy and Model Decay with Embeddings

Dan Shiebler; Luca Belli; Jay Baxter; Hanchen Xiong; Abhishek Tayal

arXiv:1809.07703·cs.SI·September 21, 2018·1 cites

Fighting Redundancy and Model Decay with Embeddings

Dan Shiebler, Luca Belli, Jay Baxter, Hanchen Xiong, Abhishek Tayal

PDF

Open Access

TL;DR

This paper discusses Twitter's approach to maintaining effective models amidst rapidly changing data by using up-to-date learned embeddings, reducing redundancy, and enhancing cross-team productivity.

Contribution

It introduces tools, algorithms, and pipelines for regularly updating and sharing high-quality embeddings across Twitter teams to combat model decay and redundancy.

Findings

01

Developed pipelines for continuous embedding updates

02

Shared embeddings improve modeling efficiency across teams

03

Regularly refreshed embeddings mitigate performance degradation

Abstract

Every day, hundreds of millions of new Tweets containing over 40 languages of ever-shifting vernacular flow through Twitter. Models that attempt to extract insight from this firehose of information must face the torrential covariate shift that is endemic to the Twitter platform. While regularly-retrained algorithms can maintain performance in the face of this shift, fixed model features that fail to represent new trends and tokens can quickly become stale, resulting in performance degradation. To mitigate this problem we employ learned features, or embedding models, that can efficiently represent the most relevant aspects of a data distribution. Sharing these embedding models across teams can also reduce redundancy and multiplicatively increase cross-team modeling productivity. In this paper, we detail the commoditized tools, algorithms and pipelines that we have developed and are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Business Process Modeling and Analysis · Software System Performance and Reliability