# The Austronesian and the Micronesian Comparative Dictionaries as CLDF datasets

**Authors:** Alexander D. Smith, Robert Forkel, Lev Blumenfeld

PMC · DOI: 10.1038/s41597-025-05301-4 · Scientific Data · 2025-06-17

## TL;DR

This paper converts two important linguistic resources into a structured format to improve their usability and enable cross-dictionary research.

## Contribution

The paper presents the first CLDF datasets for the Austronesian and Micronesian Comparative Dictionaries.

## Key findings

- The Austronesian and Micronesian Comparative Dictionaries have been converted into CLDF datasets.
- The datasets enable programmatic access and cross-dictionary interoperability.
- This conversion supports future comparative linguistic research on Austronesian languages.

## Abstract

The Austronesian Comparative Dictionary has served as an important resource for the comparative study of Austronesian languages since Robert Blust started its compilation in 1990. Likewise, the Micronesian Comparative Dictionary – an online database of Proto-Micronesian Reconstructions previously published in Oceanic Linguistics by Byron Bender and colleagues – is an important reference point for comparative Linguistics. The legacy, online versions of both dictionaries share an uncertain future, and both have not been available in a structured format, amenable to quantitative methods. Thus, to preserve the content of both dictionaries for the scientific record and to increase interoperability of the data, we undertook a conversion of the dictionaries to CLDF datasets. While programmatic access to the data within each dictionary already provides a new level of usability, the true potential of data in CLDF lies in interoperability across datasets. This is particularly useful for the two dictionaries presented here, because Micronesian languages belong to the Austronesian family and so the Micronesian data could potentially complement the Austronesian Comparative Dictionary. With the CLDF datasets we lay the groundwork for tackling this challenge.

## Full-text entities

- **Genes:** ACD (ACD shelterin complex subunit and telomerase recruitment factor) [NCBI Gene 65057] {aka DKCA6, DKCB7, PIP1, PTOP, TINT1, TPP1}, MLYCD (malonyl-CoA decarboxylase) [NCBI Gene 23417] {aka MCD}
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12174365/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12174365/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12174365/full.md

---
Source: https://tomesphere.com/paper/PMC12174365