Four Papers Accepted by DAC'2025
Four papers on efficient LLM are accepted by DAC'2025 as regular papers.
Four papers on efficient LLM:
- HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
- SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
- UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference
- ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance