Transforming Role Classification in Scientific Teams Using LLMs and Advanced Predictive Analytics
Wonduk Seo, Yi Bu

TL;DR
This paper introduces a novel approach to classifying scientific team author roles using large language models and predictive analytics, outperforming traditional methods and providing more nuanced insights into research contributions.
Contribution
It presents a new methodology combining LLMs and deep learning for author role classification, enhancing accuracy over existing clustering and self-report techniques.
Findings
GPT-4 outperforms other LLMs in role categorization
The deep learning model achieves an F1 score of 0.76
Traditional methods are less effective than LLM-based approaches
Abstract
Scientific team dynamics are critical in determining the nature and impact of research outputs. However, existing methods for classifying author roles based on self-reports and clustering lack comprehensive contextual analysis of contributions. Thus, we present a transformative approach to classifying author roles in scientific teams using advanced large language models (LLMs), which offers a more refined analysis compared to traditional clustering methods. Specifically, we seek to complement and enhance these traditional methods by utilizing open source and proprietary LLMs, such as GPT-4, Llama3 70B, Llama2 70B, and Mistral 7x8B, for role classification. Utilizing few-shot prompting, we categorize author roles and demonstrate that GPT-4 outperforms other models across multiple categories, surpassing traditional approaches such as XGBoost and BERT. Our methodology also includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
