FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation
Hung Nguyen Huy, Mo El-Haj, Dawn Knight, Paul Rayson

TL;DR
FreeTxt-Vi is an open-source bilingual toolkit that combines corpus analysis and transformer-based NLP for Vietnamese-English text analysis, enabling accessible, reproducible research in underrepresented languages.
Contribution
It introduces a unified bilingual NLP pipeline with hybrid segmentation, sentiment analysis, and summarisation, evaluated to outperform existing baselines.
Findings
Achieves competitive or superior performance in segmentation, sentiment analysis, and summarisation.
Supports reproducible research and development of Vietnamese NLP resources.
Reduces technical barriers for multilingual text analysis.
Abstract
FreeTxt-Vi is a free and open source web based toolkit for creating and analysing bilingual Vietnamese English text collections. Positioned at the intersection of corpus linguistics and natural language processing NLP it enables users to build explore and interpret free text data without requiring programming expertise. The system combines corpus analysis features such as concordancing keyword analysis word relation exploration and interactive visualisation with transformer based NLP components for sentiment analysis and summarisation. A key contribution of this work is the design of a unified bilingual NLP pipeline that integrates a hybrid VnCoreNLP and Byte Pair Encoding BPE segmentation strategy a fine tuned TabularisAI sentiment classifier and a fine tuned Qwen2.5 model for abstractive summarisation. Unlike existing text analysis platforms FreeTxt Vi is evaluated as a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Topic Modeling
