李萌

李萌

助理教授、研究员、博雅青年学者

人工智能研究院

集成电路学院

北京大学

Biography

李萌于2022年加入北京大学集成电路学院和人工智能研究院,任助理教授,博士生导师,博雅青年学者。加入北京大学前,他曾任职于美国Facebook公司的虚拟现实增强现实实验室,作为技术主管主导虚拟现实和增强现实设备中的人工智能加速算法和系统研究。他于2018年和2013年分别在美国德州大学奥斯汀分校和北京大学获得博士和学士学位。

他的研究兴趣集中于高效、安全的多模态人工智能加速算法和芯片,旨在通过算法到芯片的跨层次协同设计和优化,为人工智能构建高能效、高可靠、高安全的算力基础。他的研究获得了科技部重点研发课题、国自然重大项目课题、国自然重大研究计划培养项目等一些列国家级项目支持。

他在国际顶级会议、期刊发表文章90余篇,引用7000余次,获得最佳论文2次。此外,他还获得了DAC系统设计竞赛第一名、AICAS大模型系统设计竞赛第一名、CCF集成电路Early Career Award、欧洲设计自动化协会最佳博士论文、ACM学生科研竞赛总决赛第一名、美国德州大学奥斯汀分校 Margarida Jacome 杰出论文奖、ASPDAC博士科研论坛最佳海报报告奖、ACM/SIGDA博士科研竞赛金牌以及半导体安全领域顶会IEEE HOST和集成电路设计自动化领域顶会ACM GLSVLSI最佳论文奖。

实验室常年招收对人工智能算法和芯片感兴趣的本科生、硕士生、博士生和博士后。对于感兴趣学生,欢迎给我发送邮件,邮件主题为"Prospective Student from [Your Institute]",同时在邮件中插入你的简历、成绩单或其他材料。

兴趣爱好
  • 高效、安全多模态人工智能加速算法和芯片
  • 算法/芯片协同设计
教育经历
  • 博士,计算机工程, 2018

    德克萨斯州州立大学奥斯汀分校,美国

  • 硕士,计算机工程, 2015

    德克萨斯州州立大学奥斯汀分校,美国

  • 学士,微电子学, 2013

    北京大学,中国

Research Focus

Efficient AI Algorithm
Multi-Modal AI
AI/HW Co-Design

Experience

 
 
 
 
 
Tenure-Track Assistant Professor
7月 2022 – 现在 Beijing
Institute of Artificial Intelligence
 
 
 
 
 
Staff Research Scientist
9月 2018 – 7月 2022 California

Experience:

  • 2018.09 - 2020.01 Research Scientist
  • 2020.01 - 2021.06 Senior Research Scientist
  • 2021.06 - 2022.07 Staff Research Scientist

Responsibilities include:

  • Tech Lead, On-Device AI, Meta Reality Lab
  • Efficient NN for AR Glasses
  • Efficient NN/HW Co-Design/Co-Optimization
 
 
 
 
 
Research Intern
五月 2017 – 8月 2017 California
Privacy-preserving neural network training, including federated learning with non-IID data and PrivyNet with split network architectures
 
 
 
 
 
Research Intern
五月 2016 – 8月 2016 California
Cross-level monte carlo framework for system vulnerability evaluation against fault attack
 
 
 
 
 
Research & Design Intern
Cadence Design System
五月 2014 – 8月 2014 California
Static timing analysis acceleration

Recent Publications

More detailed publication lists available through Google Scholar

Quickly discover relevant content by filtering publications.
(2025). Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC. In Usenix Security Symposium 2025.

(2025). Swift: Fast Secure Neural Network Inference with Fully Homomorphic Encryption. In IEEE Transactions on Information Forensics and Security (TIFS) (2025).

(2025). HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference. In Design Automation Conference (DAC) 2025.

(2025). ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance. In Design Automation Conference (DAC) 2025.

(2025). SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding. In Design Automation Conference (DAC) 2025.

(2025). UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference. In Design Automation Conference (DAC) 2025.

(2025). FLASH: An Efficient Hardware Accelerator Leveraging Approximate and Sparse FFT for Homomorphic Encryption. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2025.

(2025). Compact Non-Volatile Lookup Table Architecture based on Ferroelectric FET Array through In-Situ Combinatorial One-Hot Encoding for Reconfigurable Computing. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2025.

(2025). LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2025.

(2025). SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient Scalings. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2025.

(2025). Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2025.

PDF

(2024). ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction. In Conference on Neural Information Processing Systems (NeurIPs) 2024.

PDF

(2024). PrivCirNet: Efficient Private Inference via Block Circulant Transformation. In Conference on Neural Information Processing Systems (NeurIPs) 2024.

PDF

(2024). ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). OSCA: End-to-end Serial Stochastic Computing Neural Acceleration with Fine-grained Scaling and Piecewise Activation. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). CASCADE: A Framework for CNN Accelerator Synthesis with Concatenation and Refreshing Dataflow. In IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I) (2024).

(2024). Alchemist: A Unified Accelerator Architecture for Cross-Scheme Fully Homomorphic Encryption. In Design Automation Conference (DAC) 2024.

(2024). FastQuery: Communication-efficient Embedding Table Query for Private LLMs inference. In Design Automation Conference (DAC) 2024.

(2024). MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices. In Design Automation Conference (DAC) 2024.

(2024). ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2024.

(2024). Enhancing 3D Detection Through Feature Aligned Deep Fusion. In International Conference on 3D Vision (3DV) 2024.

(2024). A 16.38TOPS and 4.55POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation. In IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I) (2024).

(2024). MixCIM: A Hybrid-Cell-Based Computing-in-Memory Macro with Less-Data-Movement and Activation-Memory-Reuse for Depthwise Separable Neural Networks. In IEEE Custom Integrated Circuits Conference (CICC) 2024.

(2023). CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference. In Conference on Neural Information Processing Systems (NeurIPs) 2023.

(2023). READ: Reliability-Enhanced Accelerator Dataflow Optimization using Critical Input Pattern Reduction. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). Falcon: Accelerating Homomorphically Encrypted Convolutions for Efficient Private Mobile Network Inference. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). Not your father’s stochastic computing (SC)! Efficient yet Accurate End-to-End SC Accelerator Design. In International Conference on ASIC (ASICON) 2023.

(2023). MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention. In International Conference on Computer Vision (ICCV) 2023.

(2023). Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network. In Design Automation Conference (DAC) 2023.

(2023). AVATAR: An Aging- and Variation-Aware Dynamic Timing Analyzer for Error-Efficient Computing. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2023).

(2023). Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2023.

(2023). READ: Reliability-Enhanced Accelerator Dataflow Optimization using Critical Input Pattern Reduction. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2023 (extended abstract).

(2022). BiT: Robustly Binarized Multi-distilled Transformer. In Conference on Neural Information Processing Systems (NeurIPs) 2022.

PDF

(2022). Depth Shrink: Empowering Hardware-Friendly Shallow Neural Networks. In Conference on Machine Learning (ICML) 2022.

PDF

(2022). Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.

PDF

(2022). SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems. In Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

(2022). Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

PDF

(2022). NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training. In Conference on Learning Representations (ICLR) 2022.

PDF

(2021). DNA: Differentiable Network-Accelerator Co-Search. In International Symposium on Low Power Electronics and Design (ISLPED) 2021.

PDF

(2021). AlphaNet: Improved Training of Supernets with Alpha-Divergence. In Conference on Machine Learning (ICML) 2021 (Long Oral).

PDF

(2021). AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling. In Conference on Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Improving efficiency in neural network accelerator using operands hamming distance optimization. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2021.

PDF

(2020). KeepAugment: A Simple Information-Preserving Data Augmentation Approach. In Conference on Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2020). Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks. In ACM/IEEE Design Automation Conference (DAC) 2020.

PDF

(2018). TimingSAT: Decamouflaging timing-based logic obfuscation. In IEEE International Test Conference (ITC) 2018.

PDF

(2018). A Practical Split Manufacturing Framework for Trojan Prevention via Simultaneous Wire Lifting and Cell Insertion. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2018).

PDF

(2018). Federated Learning with Non-IID Data. In arXiv:1806.00582 (2018).

PDF

(2018). A Practical Split Manufacturing Framework for Trojan Prevention via Simultaneous Wire Lifting and Cell Insertion. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2018.

PDF

(2018). PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training. In arXiv:1709:06161 (2018).

PDF

(2017). Provably secure camouflaging strategy for IC protection. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2017.

PDF

(2017). Provably secure camouflaging strategy for IC protection. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2018).

PDF

(2017). Cross-level monte carlo framework for system vulnerability evaluation against fault attack. In ACM/IEEE Design Automation Conference (DAC) 2017.

PDF

(2017). AppSAT: Approximately Deobfuscating Integrated Circuits. In IEEE International Symposium on Hardware Oriented Security and Trust (HOST) 2017 (Best Paper Award).

PDF

(2016). A monte carlo simulation flow for seu analysis of sequential circuits. In ACM/IEEE Design Automation Conference (DAC) 2016.

PDF

(2016). Practical public PUF enabled by solving max-flow problem on chip. In ACM/IEEE Design Automation Conference (DAC) 2016.

PDF

(2013). Characterization of Random Telegraph Noise in Scaled High-κ/Metal-Gate MOSFETs with SiO2/HfO2 Gate Dielectrics. In ECS Transactions, 52 (1) 941-946 (2013).

PDF

Accomplish­ments

AICAS Grand Challenge on LLM Hardware System Design 1st Place
Young Teachers’ Teaching Skills Competition 1st Place Prize
CCF-Ant Group Research Award on Hardware/Software Co-Design
CCF Integrated Circuits Early Career Award
Secretflow Outstanding Industry-Academic Cooperation Contribution Award
CCF-Ant Group Research Award on Privacy Computing
Margarida Jacome Outstanding Dissertation Prize
Outstanding Dissertations Award
Nominee of ACM Doctoral Dissertation Award
First Place, Student Research Competition Grand Final (Graduate Category)
Best Paper Award
Best Poster (Presentation) Award
Gold Medal
Best Paper Award
Cockrell School Graduate Student Fellowship
Yang Fuqing and Wang Yangyuan Academician Scholarship
Li Yanhong Baidu Scholarship

Contact

var dimensionValue = 'SOME_DIMENSION_VALUE'; ga('set', 'dimension1', dimensionValue);