Optimizing Sequence Alignment with Scored NFAs

Ryan Karbowniczak; Rasha Karakchi

arXiv:2501.02162·cs.ET·January 7, 2025

Optimizing Sequence Alignment with Scored NFAs

Ryan Karbowniczak, Rasha Karakchi

PDF

Open Access

TL;DR

This paper introduces NAPOLY+, an enhanced pattern-matching accelerator based on NFAs, capable of identifying optimal sequence matches by incorporating scoring mechanisms, and demonstrates its improved performance on FPGA hardware.

Contribution

NAPOLY+ extends the NAPOLY pattern-matching accelerator with scoring capabilities, enabling optimal match detection in sequence alignment tasks.

Findings

01

NAPOLY+ outperforms NAPOLY in identifying best matches.

02

Performance scales with array size on FPGA platforms.

03

Memory usage increases proportionally with array size.

Abstract

The rapid increase in symbolic data has underscored the significance of pattern matching and regular expression processing. While nondeterministic finite automata (NFA) are commonly used for these tasks, they are limited to detecting matches without determining the optimal one. This research expands on the NAPOLY pattern-matching accelerator by introducing NAPOLY+, which adds registers to each processing element to store variables like scores, weights, or edge costs. This enhancement allows NAPOLY+ to identify the highest score corresponding to the best match in sequence alignment tasks through the new-added arithmetic unit in each processor element. The design was evaluated against the original NAPOLY, with results showing that NAPOLY+ offers superior functionality and improved performance in identifying the best match. The design was implemented and tested on zynq102 and zynq104 FPGA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Machine Learning in Bioinformatics