Transforming Role Classification in Scientific Teams Using LLMs and   Advanced Predictive Analytics

Wonduk Seo; Yi Bu

arXiv:2501.07267·cs.DL·February 26, 2025

Transforming Role Classification in Scientific Teams Using LLMs and Advanced Predictive Analytics

Wonduk Seo, Yi Bu

PDF

TL;DR

This paper introduces a novel approach to classifying scientific team author roles using large language models and predictive analytics, outperforming traditional methods and providing more nuanced insights into research contributions.

Contribution

It presents a new methodology combining LLMs and deep learning for author role classification, enhancing accuracy over existing clustering and self-report techniques.

Findings

01

GPT-4 outperforms other LLMs in role categorization

02

The deep learning model achieves an F1 score of 0.76

03

Traditional methods are less effective than LLM-based approaches

Abstract

Scientific team dynamics are critical in determining the nature and impact of research outputs. However, existing methods for classifying author roles based on self-reports and clustering lack comprehensive contextual analysis of contributions. Thus, we present a transformative approach to classifying author roles in scientific teams using advanced large language models (LLMs), which offers a more refined analysis compared to traditional clustering methods. Specifically, we seek to complement and enhance these traditional methods by utilizing open source and proprietary LLMs, such as GPT-4, Llama3 70B, Llama2 70B, and Mistral 7x8B, for role classification. Utilizing few-shot prompting, we categorize author roles and demonstrate that GPT-4 outperforms other models across multiple categories, surpassing traditional approaches such as XGBoost and BERT. Our methodology also includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.