# A Systematic Approach to Building High-Impact Aging Datasets for Researchers

**Authors:** Riya Goyal, Yater Henry, Tasnim Raisa, Jessica Fleck, Duo Wei

PMC · DOI: 10.1093/geroni/igaf122.3085 · 2025-12-31

## TL;DR

This paper introduces a systematic method to identify and curate high-quality datasets for aging research, helping researchers access valuable data more efficiently.

## Contribution

A novel approach using AI and algorithmic analysis to curate impactful aging datasets based on expert-defined domains and search frequency.

## Key findings

- Using expert-defined domains and AI, the study identified datasets frequently used in aging research.
- A Python algorithm matched datasets with PubMed, revealing over 7,000 relevant publications.
- The resulting database offers a structured resource for aging-related secondary data.

## Abstract

Access to relevant and high-quality secondary datasets is essential for advancing research on the older population across diverse domains such as mental health, health services, and life satisfaction. This study presents a systematic approach to identifying and curating a searchable database of the most significant secondary datasets in these fields. Our method begins with domain experts defining key content domains and their respective subdomains. Using Claude AI, we conduct targeted searches by combining subdomains with data analytics keywords and the term “older adults” to retrieve relevant datasets. To assess the importance of each dataset, we develop a Python algorithm that analyzes the frequency of dataset occurrences across multiple search combinations. For instance, by leveraging Claude AI to generate 100 datasets through a query of “older adults,” “Alzheimer’s,” and “regression,” we subsequently matched these datasets against PubMed’s repository, revealing 7,561 relevant publications, with 7,030 specifically addressing aging research. This iterative process allows us to identify the most commonly used and impactful datasets for researchers in aging studies. The resulting database will provide a structured, accessible, and efficient resource for scholars and practitioners seeking high-value secondary data to support evidence-based research on aging-related topics.

---
Source: https://tomesphere.com/paper/PMC12761570