The first step is the hardest: Pitfalls of Representing and Tokenizing   Temporal Data for Large Language Models

Dimitris Spathis; Fahim Kawsar

arXiv:2309.06236·cs.LG·September 13, 2023·2 cites

The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models

Dimitris Spathis, Fahim Kawsar

PDF

Open Access

TL;DR

This paper examines the challenges of representing and tokenizing temporal data in Large Language Models, highlighting issues with current tokenizers and proposing potential solutions like prompt tuning and multimodal adapters.

Contribution

It identifies the pitfalls of tokenizing temporal data in LLMs and discusses methods to improve their understanding of numerical and temporal information.

Findings

01

Popular LLMs tokenize temporal data incorrectly

02

Tokenization issues hinder understanding of temporal relationships

03

Proposed solutions include prompt tuning and multimodal adapters

Abstract

Large Language Models (LLMs) have demonstrated remarkable generalization across diverse tasks, leading individuals to increasingly use them as personal assistants and universal computing engines. Nevertheless, a notable obstacle emerges when feeding numerical/temporal data into these models, such as data sourced from wearables or electronic health records. LLMs employ tokenizers in their input that break down text into smaller units. However, tokenizers are not designed to represent numerical values and might struggle to understand repetitive patterns and context, treating consecutive values as separate tokens and disregarding their temporal relationships. Here, we discuss recent works that employ LLMs for human-centric tasks such as in mobile health sensing and present a case study showing that popular LLMs tokenize temporal data incorrectly. To address that, we highlight potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · AI in Service Interactions