Efficient AI

LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design
SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient Scalings
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding