Benchmarking Large Language Models on Homework Assessment in Circuit Analysis
Liangliang Chen, Zhihao Qin, Yiming Guo, Jacqueline Rohde, Ying Zhang

TL;DR
This paper benchmarks various large language models on their ability to assess undergraduate circuit analysis homework, highlighting their strengths, limitations, and potential for educational applications.
Contribution
It introduces a novel dataset and evaluation framework for assessing LLMs in engineering education, providing benchmarks and insights for future development.
Findings
GPT-4o and Llama 3 70B outperform GPT-3.5 Turbo across all metrics
Different models show distinct strengths in solution evaluation aspects
Current LLMs have limitations in reliably assessing circuit analysis homework
Abstract
Large language models (LLMs) have the potential to revolutionize various fields, including code development, robotics, finance, and education, due to their extensive prior knowledge and rapid advancements. This paper investigates how LLMs can be leveraged in engineering education. Specifically, we benchmark the capabilities of different LLMs, including GPT-3.5 Turbo, GPT-4o, and Llama 3 70B, in assessing homework for an undergraduate-level circuit analysis course. We have developed a novel dataset consisting of official reference solutions and real student solutions to problems from various topics in circuit analysis. To overcome the limitations of image recognition in current state-of-the-art LLMs, the solutions in the dataset are converted to LaTeX format. Using this dataset, a prompt template is designed to test five metrics of student solutions: completeness, method, final answer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Innovative Teaching and Learning Methods · Career Development and Diversity
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Layer Normalization · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Softmax · Dropout · Dense Connections
