Efficient AI

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
S2CIM: A Secure-Computation and Secure-Storage Compute-in-Memory Architecture with Circuit-Algorithm Co-Design for Efficient and Trustworthy Edge Inference
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
No Redundancy, No Stall: Lightweight Streaming 3D Gaussian Splatting for Real-time Rendering
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
A 28nm 534.6TOPS/W Mixed-Precision Edge Accelerator for Embodied AI Using Stochastic Computing
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance