Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research
Boyan Xu, Liang Wen, Zihao Li, Yuxing Yang, Guanlan Wu, Xiongpeng, Tang, Yu Li, Zihao Wu, Qingxian Su, Xueqing Shi, Yue Yang, Rui Tong, How Yong, Ng

TL;DR
This paper introduces WaterER, a benchmark suite to evaluate large language models' effectiveness in water engineering tasks, highlighting GPT-4's strengths and the potential for LLMs to serve as water experts.
Contribution
First comprehensive evaluation of LLMs in water engineering using the WaterER benchmark, establishing a standardized framework for assessing their domain-specific capabilities.
Findings
GPT-4 excels in handling complex water engineering tasks
Gemini shows strong academic research capabilities
Chinese-oriented models perform well on Chinese water questions
Abstract
Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a domain-specific benchmark suite, namely, WaterER. Herein, we prepared 983 tasks related to water engineering and research, categorized into "wastewater treatment", "environmental restoration", "drinking water treatment and distribution", "sanitation", "anaerobic digestion" and "contaminants assessment". We evaluated the performance of seven LLMs (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN and Llama3) on these tasks. We highlighted the strengths of GPT-4 in handling diverse and complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Attention Dropout · Label Smoothing · Residual Connection · Transformer
