Translating the Force Concept Inventory in the age of AI
Marina Babayeva, Justin Dunlap, Marie Sn\v{e}tinov\'a, Ralf Widenhorn

TL;DR
This study evaluates GPT-4o's ability to translate the Force Concept Inventory into different languages, highlighting both its potential to expand educational access and the challenges in maintaining scientific accuracy and clarity.
Contribution
It demonstrates GPT-4o's effectiveness in translating scientific assessments while identifying language-specific issues that can alter the meaning of physics problems.
Findings
GPT-4o answers questions well in translated and back-translated languages
Translation issues include scientific term mistranslations and formatting problems
Language-specific nuances can significantly change problem meanings
Abstract
We present a study that translates the Force Concept Inventory (FCI) using OpenAI GPT-4o and assess the specific difficulties of translating a scientific-focused topic using Large Language Models (LLMs). The FCI is a physics exam meant to evaluate outcomes of a student cohort before and after instruction in Newtonian physics. We examine the problem-solving ability of the LLM in both the translated document and the translation back into English, detailing the language-dependent issues that complicate the translation. While ChatGPT performs remarkably well on answering the questions in both the translated language as well as the back-translation into English, problems arise with language-specific nuances and formatting. Pitfalls include words or phrases that lack one-to-one matching terms in another language, especially discipline-specific scientific terms, or outright mistranslations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
