MLP-KAN: Unifying Deep Representation and Function Learning

Yunhong He; Yifeng Xie; Zhengqing Yuan; Lichao Sun

arXiv:2410.03027·cs.LG·October 7, 2024

MLP-KAN: Unifying Deep Representation and Function Learning

Yunhong He, Yifeng Xie, Zhengqing Yuan, Lichao Sun

PDF

Open Access 1 Repo 3 Reviews

TL;DR

MLP-KAN is a unified deep learning framework that combines representation and function learning within a Mixture-of-Experts architecture, eliminating manual model selection and adapting dynamically to diverse tasks.

Contribution

This paper introduces MLP-KAN, a novel integrated approach that unifies representation and function learning using MoE architecture within a transformer framework.

Findings

01

Achieves superior versatility across multiple datasets

02

Delivers competitive performance in both learning paradigms

03

Simplifies model selection process for diverse tasks

Abstract

Recent advancements in both representation learning and function learning have demonstrated substantial promise across diverse domains of artificial intelligence. However, the effective integration of these paradigms poses a significant challenge, particularly in cases where users must manually decide whether to apply a representation learning or function learning model based on dataset characteristics. To address this issue, we introduce MLP-KAN, a unified method designed to eliminate the need for manual model selection. By integrating Multi-Layer Perceptrons (MLPs) for representation learning and Kolmogorov-Arnold Networks (KANs) for function learning within a Mixture-of-Experts (MoE) architecture, MLP-KAN dynamically adapts to the specific characteristics of the task at hand, ensuring optimal performance. Embedded within a transformer-based framework, our work achieves remarkable…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 1Confidence 4

Strengths

1) The premise of the paper is potentially reasonable; there is evidence in the literature that KANs and MLPs are complementary, and developing a method that adaptively selects which modeling strategy is best for a given problem is a good idea.

Weaknesses

1) The presentation of the paper is not sufficiently clear to facilitate review. The paper is generally vague, contains many typos, and the methodology is ultimately unclear. I provide examples (ii) Poor presentation. In Line 47 in the 2nd paragraph of the paper, the authors define KAN as Kernel Attention Network, yet the whole paper seems to be about Kolmogorov-Arnold Network. Line 51 and 73 in the introduction are nearly identical, and repeat the same idea. (ii) Incorrect/unclear metho

Reviewer 02Rating 8Confidence 4

Strengths

1. The manuscript is well-written, presenting clear motivations and providing step-by-step derivations of the proposed method. 2. Combining MLPs and KANs within an MoE framework is interesting. Moreover, integrating this block into a transformer architecture develops a robust backbone that effectively extracts and integrates features across various data modalities. 3. The ablation studies demonstrate that the proposed method can scale easily by increasing the number of experts, which enhances

Weaknesses

1. Although the proposed technique is shown to be generalizable to different tasks, its effectiveness in other types of tasks or with different types of data (e.g., time-series, reinforcement learning) remains unexplored. 2. Using multiple experts in an MoE architecture, especially with higher Top-K values, can significantly increase computational resource requirements. I suggest that the authors conduct ablation studies on runtime complexity and compare their proposed method with the standard

Reviewer 03Rating 6Confidence 3

Strengths

* The paper is well-written for the most part. * The motivation of the paper is valid and exciting. * The idea is simple but yet effective.

Weaknesses

* The discussion part of the paper heavily relies on the description of Multi-expert and router gating, which is not the paper's contribution. More discussion or experiments are needed to show why combining MLP and KAN should improve the performance. * One of the paper's main points is scalability, but experiments about scalability, computation time, and memory are missing. Minor Weakness: * The numbers in all Tables don't have confidence intervals, so it is hard to grasp the significance of

Code & Models

Repositories

dlyuangod/mlp-kan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction