CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes

Seong-Joon Park; Hee-Youl Kwak; Sang-Hyo Kim; Yongjune Kim; Jong-Seon No

arXiv:2405.01033·cs.LG·May 27, 2025·1 cites

CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No

PDF

Open Access 3 Reviews

TL;DR

CrossMPT introduces a novel transformer-based decoder for error correcting codes that separates and iteratively updates different input vector types, leading to improved performance and efficiency over existing neural decoders.

Contribution

It proposes a new Cross-attention Message-Passing Transformer that explicitly distinguishes input vector types and leverages code structure, enhancing decoding accuracy and efficiency.

Findings

01

Outperforms existing neural decoders across various code classes.

02

Reduces memory usage, complexity, and inference time.

03

Achieves significant performance improvements.

Abstract

Error correcting codes (ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer (CrossMPT), which shares key operational principles with conventional message-passing decoders. While conventional transformer-based decoders employ self-attention mechanism without distinguishing between the types of input vectors (i.e., magnitude and syndrome vectors), CrossMPT updates the two types of input vectors separately and iteratively using two masked cross-attention blocks. The mask matrices are determined by the code's parity-check matrix, which explicitly captures the irrelevant…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. Innovative architecture: introduced the cross-attention mechanisms to separately handle magnitude and syndrome vectors is a notable advancement, bridging concepts from traditional message-passing algorithms and modern transformer architectures. 2. Performance Gains Significantly: in the experiments show significant improvements to existing decoders across various code classes, including BCH, polar, LDPC, and turbo codes. 3. Efficiency Improvements: By reducing memory usage and computational c

Weaknesses

1. Availability of Code or Experimental Details: The paper does not mention whether the code or experimental records will be made available. Open-sourcing the code or providing detailed experimental records are crucial for reproducibility and for the community to gain more benefit from this work. 2. Code length: In the appendix H of "A Foundation Model for Error Correction Codes," the authors compare Maximum Likelihood decoders on the BCH(1023,1013) code. In this paper, authors highlight the si

Reviewer 02Rating 6Confidence 5

Strengths

The decoding of error correction codes is an important problem with significant potential impact in practice. Using neural network based decoders is a promising approach that has been receiving significant attention recently. The paper improves the state of the art transformer-based architecture with a simple yet effective enhancement in the architecture. Given that the magnitude and the syndrome provide different types of information on the transmitted symbols, it makes sense to process them se

Weaknesses

The architectural modification proposed in the paper with respect to ECCT is rather trivial. In that sense, it does not have significant technical novelty, on the other hand, given its impact on the performance, I believe it is a worthy contribution. The paper presents only BER performance results. It is not clear if the proposed approach actually provides any gains in terms of the block error probability, which would really matter for the code.

Reviewer 03Rating 8Confidence 5

Strengths

1. The update in the attention layer built upon ECCT is well-motivated and shows meaningful performance improvement. Although cross-attention is a well-known technique in the community, it is placed at the right position to improve the performance and efficiency of the previous self-attention-based approach. 2. The paper is easy to follow and provides sufficient details. 3. The experimental results effectively support the claims. 4. The in-depth analysis of the proposed method helps readers unde

Weaknesses

Although I think the paper is strong, I found a couple of minor weaknesses. 1. The actual inference time reduction from ECCT is not as great as the FLOPs improvement. Also, runtime evaluation and complexity analysis need more details about the environment. See Questions 3-5. 2. Related works on cross attention in other domains are missing, e.g., language (Vaswani et al., 2017), vision (Chen et al., 2021), text-based image generation (Rombach et al., 2022), etc. (Vaswani et al., 2017) Vaswani,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiation Effects in Electronics · VLSI and Analog Circuit Testing · Embedded Systems Design Techniques

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding