Meng Li's Homepage
Meng Li's Homepage
Home
News
Talks
Publications
Projects
Students
Contact
Light
Dark
Automatic
English
中文 (简体)
Deep Learning
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Based on the Secretflow project developed by Ant Group, develop the first model/protocol co-optimization framework for private MoE LLM inference.
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference
Based on the Secretflow project developed by Ant Group, develop the first model/protocol co-optimization framework for static-dynamic sparse attention in LLM.
Lightmamba: Efficient mamba acceleration on fpga with quantization and hardware co-design
We developed and opensourced an end-to-end FPGA implementation for efficient Mamba LLM inference based on high level synthesis. Compared to GPU, LightMamba can achieve significant latency and power reduction.
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Based on the KTransformers project, develop an adaptive scheduling framework that leverage the heterogeneous computation capability of CPU and GPU for efficient mixture-of-expert LLM inference.
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
Develop an adaptive scheduling framework for efficient mixture-of-expert LLM inference on edge devices.
NASViT: Neural Architecture Search for Efficient Vision Transformer with Gradient Conflict-Aware Supernet Training
Propose gradient conflict-aware training to improve supernet-based NAS and develop a family of optimized hybrid CNN/ViT networks that achieve state-of-the-art performance Pareto.
AlphaNet: Improved Training of Supernet with Alpha-Divergence
Develop AlphaNet to improve the supernet-based NAS with a more generalized alpha-divergence-based knowledge distillation and achieve state-of-the-art performance Pareto.
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Develop AttentiveNAS that focuses on improving the sampling strategy for supernet-based NAS to achieve state-of-the-art performance Pareto.
Cite
×
var dimensionValue = 'SOME_DIMENSION_VALUE'; ga('set', 'dimension1', dimensionValue);