Deep Learning

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

Based on the KTransformers project, develop an adaptive scheduling framework that leverage the heterogeneous computation capability of CPU and GPU for efficient mixture-of-expert LLM inference.

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Develop an adaptive scheduling framework for efficient mixture-of-expert LLM inference on edge devices.

NASViT: Neural Architecture Search for Efficient Vision Transformer with Gradient Conflict-Aware Supernet Training

Propose gradient conflict-aware training to improve supernet-based NAS and develop a family of optimized hybrid CNN/ViT networks that achieve state-of-the-art performance Pareto.

AlphaNet: Improved Training of Supernet with Alpha-Divergence

Develop AlphaNet to improve the supernet-based NAS with a more generalized alpha-divergence-based knowledge distillation and achieve state-of-the-art performance Pareto.

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

Develop AttentiveNAS that focuses on improving the sampling strategy for supernet-based NAS to achieve state-of-the-art performance Pareto.