# Identifying Algorithm Names in Code Comments

**Authors:** Jakapong Klainongsuang, Yusuf Sulistyo Nugroho, Hideaki Hata, Bundit, Manaskasemsak, Arnon Rungsawang, Pattara Leelaprute, Kenichi Matsumoto

arXiv: 1907.04557 · 2019-07-11

## TL;DR

This paper presents an automatic method to identify algorithm names in code comments, leveraging N-gram extraction and part of speech patterns, achieving high precision and recall across multiple programming languages.

## Contribution

The authors propose a rule-based approach for extracting algorithm names from comments, which is effective across diverse programming languages and can enhance data collection for machine learning tasks.

## Key findings

- High precision and recall (>0.70) in algorithm name identification
- Effective extraction across seven programming languages
- Common algorithm names identified in code comments

## Abstract

For recent machine-learning-based tasks like API sequence generation, comment generation, and document generation, large amount of data is needed. When software developers implement algorithms in code, we find that they often mention algorithm names in code comments. Code annotated with such algorithm names can be valuable data sources. In this paper, we propose an automatic method of algorithm name identification. The key idea is extracting important N-gram words containing the word `algorithm' in the last. We also consider part of speech patterns to derive rules for appropriate algorithm name identification. The result of our rule evaluation produced high precision and recall values (more than 0.70). We apply our rules to extract algorithm names in a large amount of comments from active FLOSS projects written in seven programming languages, C, C++, Java, JavaScript, Python, PHP, and Ruby, and report commonly mentioned algorithm names in code comments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.04557/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.04557/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1907.04557/full.md

---
Source: https://tomesphere.com/paper/1907.04557