1

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
S2CIM: A Secure-Computation and Secure-Storage Compute-in-Memory Architecture with Circuit-Algorithm Co-Design for Efficient and Trustworthy Edge Inference
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference
FENIX: Flexible and Efficient Hybrid HE/MPC Acceleration with Near-Memory Processing
H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
No Redundancy, No Stall: Lightweight Streaming 3D Gaussian Splatting for Real-time Rendering
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding