MuSimA: A Tool with Multi-modal Input for Generating Bespoke ABAC Datasets

Saket Jha (Indian Institute of Technology Kharagpur; India); Karthikeya S. M. Yelisetty (Indian Institute of Technology Kharagpur; India); Singabattu Sathya (Indian Institute of Technology Kharagpur; India); Shamik Sural (Indian Institute of Technology Kharagpur; India)

arXiv:2604.10501·cs.CR·April 14, 2026

MuSimA: A Tool with Multi-modal Input for Generating Bespoke ABAC Datasets

Saket Jha (Indian Institute of Technology Kharagpur, India), Karthikeya S. M. Yelisetty (Indian Institute of Technology Kharagpur, India), Singabattu Sathya (Indian Institute of Technology Kharagpur, India), Shamik Sural (Indian Institute of Technology Kharagpur, India)

PDF

TL;DR

MuSimA is a web-based tool that generates large-scale synthetic ABAC datasets with customizable attribute distributions, supporting multi-modal input including JSON specifications and sketches, to aid research scalability testing.

Contribution

We introduce MuSimA, a novel tool enabling flexible, multi-modal generation of synthetic ABAC datasets for research and testing purposes.

Findings

01

Supports multi-modal input including JSON and sketches

02

Uses LLM to extract distribution parameters from sketches

03

Allows scalable dataset generation for ABAC research

Abstract

Recent advances in research on Attribute-based Access Control (ABAC) has led to the development of several ingenious methods for representing and enforcing organizational security policies. However, so far little effort has been spent towards building a tool for generating large-scale synthetic datasets that can be used to test the developed ABAC systems. In this paper, we address this shortcoming by building MuSimA - a web-based tool for generating ABAC datasets with user-specified probability distributions of attribute values. It supports multi-modal input, i.e., users can provide specifications either as a structured JSON file or as a combination of a minimal JSON along with hand-drawn distribution sketches. In the latter case, a Large Language Model is used to automatically extract appropriate distribution parameters from the sketches. The generated synthetic ABAC data matching the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.