AlphaNet: Improved Training of Supernets with Alpha-Divergence

摘要

Weight-sharing neural architecture search (NAS) is an effective technique for automating efficient neural architecture design. Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily relies on distilling the knowledge of the supernet to the sub-networks. However, we find that the widely used distillation divergence, i.e., KL divergence, may lead to student sub-networks that over-estimate or under-estimate the uncertainty of the teacher supernet, leading to inferior performance of the sub-networks. In this work, we propose to improve the supernet training with a more generalized alpha-divergence. By adaptively selecting the alpha-divergence, we simultaneously prevent the over-estimation or under-estimation of the uncertainty of the teacher model. We apply the proposed alpha-divergence based supernets training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes, including BigNAS, Once-for-All networks, and AttentiveNAS. We achieve ImageNet top-1 accuracy of 80.0% with only 444M FLOPs.

出版物
In Conference on Machine Learning
李萌
李萌
助理教授、研究员、博雅青年学者

李萌,北京大学人工智能研究院和集成电路双聘助理教授、研究员、博雅青年学者。他的研究兴趣集中于高效、安全的多模态人工智能加速算法和芯片,旨在通过算法到芯片的跨层次协同设计和优化,为人工智能构建高能效、高可靠、高安全的算力基础。

var dimensionValue = 'SOME_DIMENSION_VALUE'; ga('set', 'dimension1', dimensionValue);