ElectriQ: A Benchmark for Assessing the Response Capability of Large Language Models in Power Marketing

Jinzhi Wang; Qingke Peng; Haozhou Li; Zeyuan Zeng; Jiangbo Zhang; Kaixuan Yang; Ningyong Wu; Qinfeng Song; Ruimeng Li; Biyi Zhou

arXiv:2507.22911·cs.CL·February 2, 2026

ElectriQ: A Benchmark for Assessing the Response Capability of Large Language Models in Power Marketing

Jinzhi Wang, Qingke Peng, Haozhou Li, Zeyuan Zeng, Jiangbo Zhang, Kaixuan Yang, Ningyong Wu, Qinfeng Song, Ruimeng Li, Biyi Zhou

PDF

Open Access

TL;DR

ElectriQ introduces a comprehensive benchmark and evaluation framework for assessing large language models in electric power marketing, emphasizing sector-specific knowledge, regulatory reasoning, and multi-turn dialogue stability.

Contribution

The paper presents ElectriQ, a large-scale sector-specific benchmark for LLMs in power marketing, and proposes SEEK-RAG, a retrieval-augmented finetuning method to enhance model performance and compliance.

Findings

01

7B models with SEEK-RAG outperform larger models in EPM tasks.

02

ElectriQ's benchmark includes over 550k dialogues and multiple evaluation metrics.

03

Domain-aligned models reduce computational costs while maintaining performance.

Abstract

As power systems decarbonise and digitalise, high penetrations of distributed energy resources and flexible tariffs make electric power marketing (EPM) a key interface between regulation, system operation and sustainable-energy deployment. Many utilities still rely on human agents and rule- or intent-based chatbots with fragmented knowledge bases that struggle with long, cross-scenario dialogues and fall short of requirements for compliant, verifiable and DR-ready interactions. Meanwhile, frontier large language models (LLMs) show strong conversational ability but are evaluated on generic benchmarks that underweight sector-specific terminology, regulatory reasoning and multi-turn process stability. To address this gap, we present ElectriQ, a large-scale benchmark and evaluation framework for LLMs in EPM. ElectriQ contains over 550k dialogues across six service domains and 24…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques