Automated Code Extraction from Discussion Board Text Dataset
Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James, Folkestad, and Marcia Moraes

TL;DR
This paper evaluates three text mining methods for automating code extraction from small discussion board datasets, demonstrating their potential to assist instructors in analyzing student discussions efficiently.
Contribution
It compares Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors for automated coding, highlighting their effectiveness in small datasets.
Findings
Automated methods can extract meaningful discussion codes.
All three approaches show promise despite small dataset size.
Automated coding supports Epistemic Network Analysis.
Abstract
This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Natural Language Processing Techniques · Advanced Text Analysis Techniques
