Biography

I am currently a tenure-track assistant professor jointly affiliated with Institute for Artificial Intelligence and School of Integrated Circuits in Peking University. Before joining Peking University, I was a staff research scientist and tech lead in Meta On-Device AI team with a focus on researching and productizing efficient AI algorithms and hardwares for next generation AR/VR devices. I received my Ph.D. degree in the Department of Electrical and Computer Engineering, University of Texas at Austin under the supervision of Prof. David Z. Pan and my bachelor degree in Peking University under the supervision of Prof. Ru Huang and Prof. Runsheng Wang.

My research interests focus on efficient and secure multi-modality AI acceleration algorithms and hardwares.

I am always looking for creative and self-motivated students and post docs who are interested in co-designing the future AI acceleration algorithm and system for efficiency and privacy. Please contact me via email with the subject line “Prospective Student from [Your Institute]” and your CV. (I have finished the Ph.D. student recruiting for 2024. If you are interested in applying for 2025, contact me early.)

Download my resumé.

Interests
  • Efficient and Secure Multi-Modality Artificial Intelligence
  • Algorithm/Hardware Co-Design/Co-Optimization
Education
  • PhD in Computer Engineering, 2018

    University of Texas at Austin, Austin, Tx, USA

  • MS in Computer Engineering, 2015

    University of Texas at Austin, Austin, Tx, USA

  • BS in Microelectronics, 2013

    Peking University, Beijing, China

Research Focus

Efficient AI Algorithm
Multi-Modal AI
AI/HW Co-Design

Experience

 
 
 
 
 
Tenure-Track Assistant Professor
Jul 2022 – Present Beijing
Institute of Artificial Intelligence
 
 
 
 
 
Staff Research Scientist
Sep 2018 – Jul 2022 California

Experience:

  • 2018.09 - 2020.01 Research Scientist
  • 2020.01 - 2021.06 Senior Research Scientist
  • 2021.06 - 2022.07 Staff Research Scientist

Responsibilities include:

  • Tech Lead, On-Device AI, Meta Reality Lab
  • Efficient NN for AR Glasses
  • Efficient NN/HW Co-Design/Co-Optimization
 
 
 
 
 
Research Intern
May 2017 – Aug 2017 California
Privacy-preserving neural network training, including federated learning with non-IID data and PrivyNet with split network architectures
 
 
 
 
 
Research Intern
May 2016 – Aug 2016 California
Cross-level monte carlo framework for system vulnerability evaluation against fault attack
 
 
 
 
 
Research & Design Intern
Cadence Design System
May 2014 – Aug 2014 California
Static timing analysis acceleration

Recent Publications

More detailed publication lists available through Google Scholar

Quickly discover relevant content by filtering publications.
(2025). Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2025.

PDF

(2024). ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction. In Conference on Neural Information Processing Systems (NeurIPs) 2024.

PDF

(2024). PrivCirNet: Efficient Private Inference via Block Circulant Transformation. In Conference on Neural Information Processing Systems (NeurIPs) 2024.

PDF

(2024). MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). OSCA: End-to-end Serial Stochastic Computing Neural Acceleration with Fine-grained Scaling and Piecewise Activation. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding. In International Conference on Computer-Aided Design (ICCAD) 2024.

(2024). CASCADE: A Framework for CNN Accelerator Synthesis with Concatenation and Refreshing Dataflow. In IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I) (2024).

(2024). Alchemist: A Unified Accelerator Architecture for Cross-Scheme Fully Homomorphic Encryption. In Design Automation Conference (DAC) 2024.

(2024). FastQuery: Communication-efficient Embedding Table Query for Private LLMs inference. In Design Automation Conference (DAC) 2024.

(2024). MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices. In Design Automation Conference (DAC) 2024.

(2024). ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2024.

(2024). Enhancing 3D Detection Through Feature Aligned Deep Fusion. In International Conference on 3D Vision (3DV) 2024.

(2024). A 16.38TOPS and 4.55POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation. In IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I) (2024).

(2024). MixCIM: A Hybrid-Cell-Based Computing-in-Memory Macro with Less-Data-Movement and Activation-Memory-Reuse for Depthwise Separable Neural Networks. In IEEE Custom Integrated Circuits Conference (CICC) 2024.

(2023). CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference. In Conference on Neural Information Processing Systems (NeurIPs) 2023.

(2023). Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). Falcon: Accelerating Homomorphically Encrypted Convolutions for Efficient Private Mobile Network Inference. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention. In International Conference on Computer Vision (ICCV) 2023.

(2023). Not your father’s stochastic computing (SC)! Efficient yet Accurate End-to-End SC Accelerator Design. In International Conference on ASIC (ASICON) 2023.

(2023). READ: Reliability-Enhanced Accelerator Dataflow Optimization using Critical Input Pattern Reduction. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2023.

(2023). AVATAR: An Aging- and Variation-Aware Dynamic Timing Analyzer for Error-Efficient Computing. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2023).

(2023). Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network. In Design Automation Conference (DAC) 2023.

(2023). READ: Reliability-Enhanced Accelerator Dataflow Optimization using Critical Input Pattern Reduction. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2023 (extended abstract).

(2023). Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion. In Design, Automation and Test in Europe Conference and Exhibition (DATE) 2023.

(2022). BiT: Robustly Binarized Multi-distilled Transformer. In Conference on Neural Information Processing Systems (NeurIPs) 2022.

PDF

(2022). Depth Shrink: Empowering Hardware-Friendly Shallow Neural Networks. In Conference on Machine Learning (ICML) 2022.

PDF

(2022). Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.

PDF

(2022). Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

PDF

(2022). SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems. In Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

(2022). NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training. In Conference on Learning Representations (ICLR) 2022.

PDF

(2021). DNA: Differentiable Network-Accelerator Co-Search. In International Symposium on Low Power Electronics and Design (ISLPED) 2021.

PDF

(2021). AlphaNet: Improved Training of Supernets with Alpha-Divergence. In Conference on Machine Learning (ICML) 2021 (Long Oral).

PDF

(2021). AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling. In Conference on Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Improving efficiency in neural network accelerator using operands hamming distance optimization. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2021.

PDF

(2020). KeepAugment: A Simple Information-Preserving Data Augmentation Approach. In Conference on Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2020). Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks. In ACM/IEEE Design Automation Conference (DAC) 2020.

PDF

(2018). TimingSAT: Decamouflaging timing-based logic obfuscation. In IEEE International Test Conference (ITC) 2018.

PDF

(2018). A Synergistic Framework for Hardware IP Privacy and Integrity Protection. In Springer (2018).

PDF

(2018). A Practical Split Manufacturing Framework for Trojan Prevention via Simultaneous Wire Lifting and Cell Insertion. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2018).

PDF

(2018). Federated Learning with Non-IID Data. In arXiv:1806.00582 (2018).

PDF

(2018). A Practical Split Manufacturing Framework for Trojan Prevention via Simultaneous Wire Lifting and Cell Insertion. In Asia and South Pacific Design Automation Conference (ASP-DAC) 2018.

PDF

(2018). PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training. In arXiv:1709:06161 (2018).

PDF

(2017). Provably secure camouflaging strategy for IC protection. In ACM/IEEE International Conference on Computer Aided Design (ICCAD) 2017.

PDF

(2017). Provably secure camouflaging strategy for IC protection. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2018).

PDF

(2017). Cross-level monte carlo framework for system vulnerability evaluation against fault attack. In ACM/IEEE Design Automation Conference (DAC) 2017.

PDF

(2017). AppSAT: Approximately Deobfuscating Integrated Circuits. In IEEE International Symposium on Hardware Oriented Security and Trust (HOST) 2017 (Best Paper Award).

PDF

(2016). A monte carlo simulation flow for seu analysis of sequential circuits. In ACM/IEEE Design Automation Conference (DAC) 2016.

PDF

(2016). Practical public PUF enabled by solving max-flow problem on chip. In ACM/IEEE Design Automation Conference (DAC) 2016.

PDF

(2013). Characterization of Random Telegraph Noise in Scaled High-κ/Metal-Gate MOSFETs with SiO2/HfO2 Gate Dielectrics. In ECS Transactions, 52 (1) 941-946 (2013).

PDF

Accomplish­ments

Margarida Jacome Outstanding Dissertation Prize
Outstanding Dissertations Award
Nominee of ACM Doctoral Dissertation Award
First Place, Student Research Competition Grand Final (Graduate Category)
Best Paper Award
Best Poster (Presentation) Award
Gold Medal
Best Paper Award
Cockrell School Graduate Student Fellowship
Yang Fuqing and Wang Yangyuan Academician Scholarship
Li Yanhong Baidu Scholarship

Contact

var dimensionValue = 'SOME_DIMENSION_VALUE'; ga('set', 'dimension1', dimensionValue);