No Saved Kaleidosope: an 100% Jitted Neural Network Coding Language with   Pythonic Syntax

Augusto Seben da Rosa; Marlon Daniel Angeli; Jorge Aikes Junior; Alef; Iury Ferreira; Lucas Rafael Gris; Anderson da Silva Soares; Arnaldo Candido; Junior; Frederico Santos de Oliveira; Gabriel Trevisan Damke; Rafael; Teixeira Sousa

arXiv:2409.11600·cs.PL·September 23, 2024

No Saved Kaleidosope: an 100% Jitted Neural Network Coding Language with Pythonic Syntax

Augusto Seben da Rosa, Marlon Daniel Angeli, Jorge Aikes Junior, Alef, Iury Ferreira, Lucas Rafael Gris, Anderson da Silva Soares, Arnaldo Candido, Junior, Frederico Santos de Oliveira, Gabriel Trevisan Damke, Rafael, Teixeira Sousa

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fully jitted neural network coding language with Pythonic syntax, combining object-oriented features, automatic differentiation, and high-performance GPU operations, achieving comparable results to PyTorch on CIFAR-10.

Contribution

It presents a novel jitted compiler for neural networks with Python-like syntax, integrating advanced GPU management and automatic differentiation, and demonstrates competitive performance on benchmark datasets.

Findings

01

Achieves similar speed and accuracy to PyTorch on CIFAR-10.

02

Shows degraded performance on ImageNet and GRU tasks compared to existing frameworks.

03

Provides an open-source implementation for further research and development.

Abstract

We developed a jitted compiler for training Artificial Neural Networks using C++, LLVM and Cuda. It features object-oriented characteristics, strong typing, parallel workers for data pre-processing, pythonic syntax for expressions, PyTorch like model declaration and Automatic Differentiation. We implement the mechanisms of cache and pooling in order to manage VRAM, cuBLAS for high performance matrix multiplication and cuDNN for convolutional layers. Our experiments with Residual Convolutional Neural Networks on ImageNet, we reach similar speed but degraded performance. Also, the GRU network experiments show similar accuracy, but our compiler have degraded speed in that task. However, our compiler demonstrates promising results at the CIFAR-10 benchmark, in which we reach the same performance and about the same speed as PyTorch. We make the code publicly available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nosaveddata/nosavedkaleidoscope
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques

MethodsGated Recurrent Unit · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings