Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models
Nguyen Quang Duc, Le Hai Son, Nguyen Duc Nhan, Nguyen Dich Nhat Minh,, Le Thanh Huong, Dinh Viet Sang

TL;DR
This paper advances Vietnamese language understanding and generation by developing open datasets and pre-trained models for Retrieval-Augmented Generation and Large Language Models.
Contribution
It introduces new open datasets and pre-trained models specifically designed for Vietnamese RAG and LLMs, enhancing language technology resources.
Findings
Improved Vietnamese language understanding capabilities
Enhanced generation quality for Vietnamese texts
Availability of open datasets and models for research
Abstract
This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bkai-foundation-models/vietnamese-bi-encodermodel· 244k dl· ♡ 71244k dl♡ 71
- 🤗bkai-foundation-models/vietnamese-llama2-7b-40GBmodel· 87 dl· ♡ 4687 dl♡ 46
- 🤗bkai-foundation-models/vietnamese-llama2-7b-120GBmodel· ♡ 36♡ 36
- 🤗nold/vietnamese-llama2-7b-120GB-GGUFmodel· 70 dl· ♡ 170 dl♡ 1
- 🤗nhatminh/vietnamese_bi_encodermodel· 3 dl3 dl
- 🤗Bachhoang/DATN-vietnamese-bi-encodermodel· 1 dl1 dl
- 🤗phamduyphuong251/test-deploy-difymodel· 1 dl1 dl
- 🤗RichardErkhov/bkai-foundation-models_-_vietnamese-llama2-7b-40GB-8bitsmodel
- 🤗hoangnb/vietnamese-bi-encodermodel· 1 dl1 dl
- bkai-foundation-models/BKAINewsCorpusdataset· 471 dl471 dl
- bkai-foundation-models/vi-alpacadataset· 110 dl110 dl
- bkai-foundation-models/vi-alpaca-input-output-formatdataset· 37 dl37 dl
- bkai-foundation-models/vi-self-chat-sharegpt-formatdataset· 37 dl37 dl
- bkai-foundation-models/vietnamese-roleplay-realmdataset· 38 dl38 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
