SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison
Anjali Rawal, Hui Wang, Youjia Zheng, Yu-Hsuan Lin, Shanu Sushmita

TL;DR
This study evaluates the effectiveness of machine learning models in detecting texts generated by large language models across different text lengths and analyzes their linguistic and moral characteristics.
Contribution
It provides a comprehensive comparison of detection accuracy for LLM-generated texts of varying sizes and explores the linguistic and moral differences between human and machine texts.
Findings
Detection accuracy is high for smaller LLMs (96%+).
Detection becomes more challenging for very large LLMs (74%).
Machine texts have higher readability and similar moral judgments but differ in personality traits.
Abstract
Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (tweets from Election, FIFA, and Game of Thrones), (2) medium (Wikipedia introductions and PubMed abstracts), and (3) large (OpenAI web text dataset). Our results indicate that LLMs with very large parameters (such as the XL-1542 variant of GPT2 with 1542 million parameters) were harder (74%) to detect using traditional machine learning methods. However, detecting texts of varying lengths from LLMs with smaller parameters (762 million or less) can be done with high accuracy (96% and above). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsSoftmax · Attention Is All You Need · Support Vector Machine
