Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic   Knowledge and Machine Reasoning

Ajmal PS; Ditto PS; Jithin VG

arXiv:2404.13065·cs.CL·April 23, 2024

Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic Knowledge and Machine Reasoning

Ajmal PS, Ditto PS, Jithin VG

PDF

Open Access 1 Datasets

TL;DR

Intellecta Cognitiva is a large, synthetic dataset designed to improve language models' reasoning and educational narrative generation by combining extensive synthetic and textbook data.

Contribution

The paper introduces a novel hybrid dataset that significantly enhances reasoning and educational content generation in language models.

Findings

01

Enables complex reasoning and detailed explanations in language models

02

Combines 8.01 billion synthetic tokens with 3.52 billion textbook tokens

03

Supports advanced cognitive processing in AI models

Abstract

Intellecta dataset emerges as an innovative synthetic dataset, engineered to enhance the cognitive processing capabilities of contemporary language models. With a composition of 11.53 billion tokens, integrating 8.01 billion tokens of synthetic data with 3.52 billion tokens of rich textbook data, Intellecta is crafted to foster advanced reasoning and comprehensive educational narrative generation. Leveraging the Mixtral-8x7B-Instruct-v0.1 model, the dataset facilitates the generation of complex thought processes and detailed, textbook-style explanations, thus enabling language models to engage in both critical thinking and profound educational discourse. This hybrid dataset stands as a testament to the potential of synthetic data in pushing the boundaries of AI, offering a repository that is not only vast and varied but also refined to align with ethical standards and intellectual rigor.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

budecosystem/intellecta
dataset· 176 dl
176 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics

MethodsALIGN