Malicious Repurposing of Open Science Artefacts by Using Large Language Models
Zahra Hashemi, Zhiqiang Zhong, Jun Pang, Wei Zhao

TL;DR
This paper reveals how large language models can be exploited to maliciously repurpose open science artefacts, highlighting the risks and the unreliability of LLMs as evaluators in assessing potential harms.
Contribution
The authors introduce an end-to-end pipeline demonstrating how LLMs can be manipulated to generate harmful research proposals from open science artefacts, exposing security vulnerabilities.
Findings
LLMs can produce harmful proposals by exploiting open artefacts
LLMs acting as evaluators show significant disagreement on safety assessments
Human judgment remains essential for credible risk evaluation
Abstract
The rapid evolution of large language models (LLMs) has fuelled enthusiasm about their role in advancing scientific discovery, with studies exploring LLMs that autonomously generate and evaluate novel research ideas. However, little attention has been given to the possibility that such models could be exploited to produce harmful research by repurposing open science artefacts for malicious ends. We fill the gap by introducing an end-to-end pipeline that first bypasses LLM safeguards through persuasion-based jailbreaking, then reinterprets NLP papers to identify and repurpose their artefacts (datasets, methods, and tools) by exploiting their vulnerabilities, and finally assesses the safety of these proposals using our evaluation framework across three dimensions: harmfulness, feasibility of misuse, and soundness of technicality. Overall, our findings demonstrate that LLMs can generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education
