Efficient AI

FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference
HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer
A 16.38TOPS and 4.55POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation