TL;DR
PyTupli is a scalable Python tool that simplifies creating, sharing, and managing datasets and benchmarks for offline reinforcement learning, promoting collaboration and reproducibility.
Contribution
It introduces a standardized infrastructure for dataset management in offline RL, enabling easier sharing and curation of custom benchmarks.
Findings
Supports fine-grained filtering of datasets
Provides secure, scalable deployment options
Facilitates collaborative offline RL research
Abstract
Offline reinforcement learning (RL) has gained traction as a powerful paradigm for learning control policies from pre-collected data, eliminating the need for costly or risky online interactions. While many open-source libraries offer robust implementations of offline RL algorithms, they all rely on datasets composed of experience tuples consisting of state, action, next state, and reward. Managing, curating, and distributing such datasets requires suitable infrastructure. Although static datasets exist for established benchmark problems, no standardized or scalable solution supports developing and sharing datasets for novel or user-defined benchmarks. To address this gap, we introduce PyTupli, a Python-based tool to streamline the creation, storage, and dissemination of benchmark environments and their corresponding tuple datasets. PyTupli includes a lightweight client library with…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
Overall, I felt that the PyTupli paper does a good job of articulating its design motivations and potential to standardize how offline RL datasets are shared and benchmarked. PyTupli recognizes and addresses important problems: offline RL datasets need to be stored, accessed, and filtered. The solution seems clear and reasonable, combining a simple client API with a deployable backend that emphasizes reproducibility and collaboration. The system design is modular: researchers can record tuples
I am mixed about this paper. On one hand, it's clear that open-source contributions of this flavor have been crucial in enabling the last decade of progress in AI research. Venues like ICLR also have a track record of publishing technical reports on software; for example PyTorch at NeurIPS 2019 or Einops at ICLR 2022. I do, on the other hand, have some concerns about PyTupli. The main thing is that it's difficult without either (i) existing adoption or (ii) more concrete evaluation to assess ho
The manuscript is clearly structured and well written, making it easy to follow even for readers outside the immediate area of offline RL tools. The inclusion of motivating examples and user stories helps to ground the technical contribution in practical scenarios. These aspects make the paper accessible and give readers a sense of how the proposed tool might be applied in real research contexts.
The paper’s most significant limitation is that it remains a purely descriptive system paper without any empirical or comparative evaluation. While the system architecture is explained in detail, the authors do not provide quantitative evidence that PyTupli improves storage efficiency, scalability, or usability over existing tools. For a paper that positions itself as an engineering contribution, the lack of metrics or experimental benchmarks undermines its credibility. Furthermore, the novelty
To avoid sounding overly harsh, I would like to start by noting that the paper is written in a clear and comprehensive manner. I particularly appreciate the motivating stories that help illustrate the motivation and potential workflows. Unfortunately, and with all due respect, this is where the paper’s main strengths seem to end. It is possible that the issue lies more in the presentation (in terms of the narrative and the experiments) rather than in the underlying work itself. However, as a rev
Firstly, the entire paper consists solely of a dry description of the structure and functioning of the proposed system, completely omitting any experiments or metrics. Since this is purely an engineering contribution and does not offer new scientific knowledge, it must instead demonstrate the practical benefits and in a context relevant to the original motivation in a very clear and understandable way. The value of such work lies not in the quantity and complexity of the work done, but in its ul
- Problem Significance: The paper tackles a real and important problem in applied and collaborative offline RL research: the management of datasets, environments, and artifacts. - System Design: The architecture is modern, robust, and well-conceived, relying on standard open-source components, which facilitates deployment and maintenance. The inclusion of features like user management, access control, and a production-ready Docker Compose setup is a major strength. - Usability: The design of t
- No Empirical Evaluation: The primary weakness is the complete lack of quantitative evaluation. The title claims scalability, but no evidence is provided to support this. Benchmarks on ingestion/download speeds, query times on large datasets, or memory/CPU usage under load are necessary to validate such claims. - Limited Comparison to Alternatives: The paper briefly dismisses tools like Git but fails to compare PyTupli against more relevant data-centric tools like DVC (Data Version Control) or
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLib
