Automated Code Extraction from Discussion Board Text Dataset

Sina Mahdipour Saravani; Sadaf Ghaffari; Yanye Luther; James; Folkestad; and Marcia Moraes

arXiv:2210.17495·cs.LG·April 20, 2023

Automated Code Extraction from Discussion Board Text Dataset

Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James, Folkestad, and Marcia Moraes

PDF

Open Access

TL;DR

This paper evaluates three text mining methods for automating code extraction from small discussion board datasets, demonstrating their potential to assist instructors in analyzing student discussions efficiently.

Contribution

It compares Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors for automated coding, highlighting their effectiveness in small datasets.

Findings

01

Automated methods can extract meaningful discussion codes.

02

All three approaches show promise despite small dataset size.

03

Automated coding supports Epistemic Network Analysis.

Abstract

This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Advanced Text Analysis Techniques