# An open dataset of Chinese duration expressions

**Authors:** Si-Qi Zhang, Jia-Wen Niu, Xiaoqian Liu, Xiao-Yang Sui, Li-Lin Rao

PMC · DOI: 10.1038/s41597-025-06016-2 · Scientific Data · 2025-11-03

## TL;DR

This paper introduces an open dataset of 2,101 Chinese duration expressions with numerical annotations and frequency data, supporting research in language and cognition.

## Contribution

The paper presents a novel, open dataset of Chinese duration expressions with numerical and frequency annotations.

## Key findings

- The dataset includes 2,101 Chinese duration expressions annotated with numerical durations.
- Word frequencies were derived from a 10 billion character corpus, providing adjusted frequency values.
- The dataset supports research in natural language processing, psychology, and linguistics.

## Abstract

Duration information is essential for understanding and analyzing our world. In textual contexts, duration information is typically conveyed in two formats: numeric (e.g., 1 hour) and verbal (e.g., shortly). To analyze duration information in text, it is crucial to understand how people map duration expressions to corresponding numerical duration. However, the literature has yet to provide lexicons supporting such conversion. Furthermore, existing databases of time-related expressions often lack information about word frequency – a robust predictor of information processing. This article reports an open dataset of 2,101 Chinese duration expressions, each annotated with its corresponding numerical duration. To obtain high-quality data for word frequency, we obtained the frequency of each duration expression from a large-scale corpus of 10 billion Chinese characters (BLCU Corpus Center (BCC) Corpus) and computed an adjusted frequency for each expression. This dataset provides a valuable resource for research on temporal information in Chinese, facilitating studies in natural language processing, psychology, and linguistics.

## Full-text entities

- **Chemicals:** H23074 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Equus caballus (domestic horse, species) [taxon 9796]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12583718/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12583718/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12583718/full.md

---
Source: https://tomesphere.com/paper/PMC12583718