ResBit: Residual Bit Vector for Categorical Values
Masane Fuchi, Amar Zanashir, Hiroto Minami, Tomohiro Takagi

TL;DR
ResBit introduces a dense, low-dimensional representation for categorical data that accelerates training and maintains performance, addressing the limitations of one-hot vectors in high-cardinality tabular data generation.
Contribution
The paper proposes Residual Bit Vectors (ResBit), a novel method that overcomes the limitations of one-hot vectors and analog bits for efficient high-cardinality categorical data representation.
Findings
ResBit accelerates training in tabular data generation.
ResBit maintains performance comparable to traditional methods.
High-cardinality data challenges are mitigated by ResBit's low-dimensional encoding.
Abstract
One-hot vectors, a common method for representing discrete/categorical data, in machine learning are widely used because of their simplicity and intuitiveness. However, one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. In this paper, we focus on tabular data generation, and reveal the multinomial diffusion faces the mode collapse phenomenon when the cardinality is high. Moreover, due to the limitations of one-hot vectors, the training phase takes time longer in such a situation. To address these issues, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. ResBit is an extension of analog bits and overcomes limitations of analog bits when applied to tabular data generation. Our experiments demonstrate that ResBit not…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The paper introduces the interesting extension of Analog Bits. 2. The paper has good theoretical fundaments.
1. In the abstract, the authors introduce methods in a different order than in the introduction. It is misleading. Maybe it is possible to do it consistently. 2. The first Fig 1. in the paper refers to the reference paper. Maybe at the beginning, authors can give some illustrations describing the new proposed method. 3. Some illustrations of the method should be added. 4. The model proposes three elements: ResBit, TRBD, and conditioning GAN. Unfortunately, none of such components are well evalu
I find it really hard to find the strengths of this paper. See the reasons below.
- There are several false claims in the paper. First, ResBit may not fully address the “out-of-index” issue. Since $N=50=32+16+2$, the example given in the paper is free from the issue. Proof for any natural number is missing. One can find a counterexample by find the ResBit representation of $N=51$? Second, the ResBit does not really improve or at least achieve no worse results compared to their baselines. In some cases, ResBit even performs much worse than the baselines. - Some descriptions i
The proposed method is very simple to understand and implement.
Paper has two weaknesses: results and presentation 1. Results. On 5 tabular datasets where the such an encoding method would be of most use, the proposed is clearly better only on 2 of the tasks (CC, AR), whereas on BD and AD performance is on par, and I'm going to discount any results on IS due to the size of dataset (1338). Similarly, when used for conditioning of GANs, visually speaking res-bit results seems to be worse (much less diverse) and have no strong edge over one-hot in classificati
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsFocus · Diffusion
