A Hierarchical Approach to exploiting Multiple Datasets from TalkBank

Man Ho Wong

arXiv:2306.12596·cs.DB·June 23, 2023·1 cites

A Hierarchical Approach to exploiting Multiple Datasets from TalkBank

Man Ho Wong

PDF

Open Access 1 Repo

TL;DR

This paper presents a hierarchical pipeline framework for efficient data selection, integration, and analysis across multiple datasets in TalkBank, enhancing research capabilities beyond existing API limitations.

Contribution

It introduces a novel hierarchical search and data integration framework that improves data filtering, indexing, and cross-study analysis in TalkBank and similar platforms.

Findings

01

Enhanced data filtering and batch processing capabilities.

02

Facilitated integration of datasets through metadata standardization.

03

Improved access and analysis of large, complex linguistic datasets.

Abstract

TalkBank is an online database that facilitates the sharing of linguistics research data. However, the existing TalkBank's API has limited data filtering and batch processing capabilities. To overcome these limitations, this paper introduces a pipeline framework that employs a hierarchical search approach, enabling efficient complex data selection. This approach involves a quick preliminary screening of relevant corpora that a researcher may need, and then perform an in-depth search for target data based on specific criteria. The identified files are then indexed, providing easier access for future analysis. Furthermore, the paper demonstrates how data from different studies curated with the framework can be integrated by standardizing and cleaning metadata, allowing researchers to extract insights from a large, integrated dataset. While being designed for TalkBank, the framework can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manhowong/talkbank-pipeline
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies