Scalable Scientific Interest Profiling Using Large Language Models

Yilun Liang; Gongbo Zhang; Edward Sun; Betina Idnay; Yilu Fang; Fangyi Chen; Casey Ta; Yifan Peng; Chunhua Weng

arXiv:2508.15834·cs.CL·January 7, 2026

Scalable Scientific Interest Profiling Using Large Language Models

Yilun Liang, Gongbo Zhang, Edward Sun, Betina Idnay, Yilu Fang, Fangyi Chen, Casey Ta, Yifan Peng, Chunhua Weng

PDF

TL;DR

This study evaluates large language models for automated, scalable scientific interest profiling, comparing MeSH term-based and abstract-based methods against human profiles, revealing moderate semantic similarity but differences in keyword usage and concept selection.

Contribution

The paper introduces and assesses two LLM-based methods for generating scientific profiles, highlighting their potential and limitations compared to human-curated profiles.

Findings

01

Moderate semantic similarity between machine and human profiles (F1 ~0.54-0.56).

02

MeSH-based profiles are more readable than abstract-based ones.

03

Machine summaries differ significantly in keyword usage and concept choice from human profiles.

Abstract

Research profiles highlight scientists' research focus, enabling talent discovery and collaborations, but are often outdated. Automated, scalable methods are urgently needed to keep profiles current. We design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles--one summarizing PubMed abstracts and the other using Medical Subject Headings (MeSH) terms--comparing them with researchers' self-summarized interests. We collected titles, MeSH terms, and abstracts of PubMed publications for 595 faculty at Columbia University Irving Medical Center, obtaining human-written profiles for 167. GPT-4o-mini was prompted to summarize each researcher's interests. Manual and automated evaluations characterized similarities between machine-generated and self-written profiles. The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.