Efficient AI

HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
No Redundancy, No Stall: Lightweight Streaming 3D Gaussian Splatting for Real-time Rendering
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
A 28nm 534.6TOPS/W Mixed-Precision Edge Accelerator for Embodied AI Using Stochastic Computing
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design