Improving the sample-efficiency of neural architecture search with reinforcement learning
Attila Nagy, \'Abel Boros

TL;DR
This paper enhances neural architecture search efficiency by replacing the traditional REINFORCE algorithm with PPO in a reinforcement learning framework, leveraging parameter sharing to reduce training time and improve stability.
Contribution
It introduces the use of PPO for training the controller in ENAS, improving stability and speed over the original REINFORCE method in NAS.
Findings
PPO accelerates NAS training compared to REINFORCE.
Parameter sharing reduces computational resources needed.
Enhanced stability observed during controller training.
Abstract
Designing complex architectures has been an essential cogwheel in the revolution deep learning has brought about in the past decade. When solving difficult problems in a datadriven manner, a well-tried approach is to take an architecture discovered by renowned deep learning scientists as a basis (e.g. Inception) and try to apply it to a specific problem. This might be sufficient, but as of now, achieving very high accuracy on a complex or yet unsolved task requires the knowledge of highly-trained deep learning experts. In this work, we would like to contribute to the area of Automated Machine Learning (AutoML), specifically Neural Architecture Search (NAS), which intends to make deep learning methods available for a wider range of society by designing neural topologies automatically. Although several different approaches exist (e.g. gradient-based or evolutionary algorithms), our focus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Reinforcement Learning in Robotics
MethodsEntropy Regularization · Proximal Policy Optimization · REINFORCE
