Nirjas: An open source framework for extracting metadata from the source   code

Ayush Bhardwaj; Sahil; Kaushlendra Pratap; Gaurav Mishra

arXiv:2409.14609·cs.SE·September 24, 2024

Nirjas: An open source framework for extracting metadata from the source code

Ayush Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra

PDF

TL;DR

Nirjas is an open-source Python framework designed to extract and structure metadata and comments from source code across multiple programming languages, aiding software comprehension.

Contribution

It introduces a novel, accurate regex-based method for extracting metadata and comments, handling various syntaxes and conventions in source code.

Findings

01

Effective extraction of metadata across languages

02

Accurate separation of comment types and code

03

Easy to install and integrate in development workflows

Abstract

Metadata and comments are critical elements of any software development process. In this paper, we explain how metadata and comments in source code can play an essential role in comprehending software. We introduce a Python-based open-source framework, Nirjas, which helps in extracting this metadata in a structured manner. Various syntaxes, types, and widely accepted conventions exist for adding comments in source files of different programming languages. Edge cases can create noise in extraction, for which we use Regex to accurately retrieve metadata. Non-Regex methods can give results but often miss accuracy and noise separation. Nirjas also separates different types of comments, source code, and provides details about those comments, such as line number, file name, language used, total SLOC, etc. Nirjas is a standalone Python framework/library and can be easily installed via source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.