TL;DR
This paper introduces FinChat, a Finnish conversational corpus and evaluation setup, highlighting the challenges of developing effective Finnish chatbots and providing a benchmark for future research.
Contribution
It presents the creation of the Finnish chat corpus FinChat and an associated evaluation task, filling a resource gap for Finnish open-domain chatbot research.
Findings
Off-the-shelf models perform no better than chance on the evaluation task.
Humans achieve near-perfect accuracy on the same task.
Chatbots generate responses often marked as incoherent in human evaluations.
Abstract
Creating open-domain chatbots requires large amounts of conversational data and related benchmark tasks to evaluate them. Standardized evaluation tasks are crucial for creating automatic evaluation metrics for model development; otherwise, comparing the models would require resource-expensive human evaluation. While chatbot challenges have recently managed to provide a plethora of such resources for English, resources in other languages are not yet available. In this work, we provide a starting point for Finnish open-domain chatbot research. We describe our collection efforts to create the Finnish chat conversation corpus FinChat, which is made available publicly. FinChat includes unscripted conversations on seven topics from people of different ages. Using this corpus, we also construct a retrieval-based evaluation task for Finnish chatbot development. We observe that off-the-shelf…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
