Efficient Audio-Visual Understanding on AR Devices


Augmented reality (AR) is a set of technologies that will fundamentally change the way we interact with our environment. It represents a merging of the physical and the digital worlds into a rich, context aware user interface delivered through a socially acceptable form factor such as eyeglasses. The majority of these novel experiences in AR systems will be powered by AI because of their superior ability to handle in-the-wild scenarios. A key AR use case is a personalized, proactive and context-aware Assistant that can understand the user’s activity and their environment using audio-visual understanding models. In this presentation, we will discuss the challenges and opportunities in both training and deployment of efficient audio-visual understanding on AR glasses. We will discuss enabling always-on experiences within a constrained power budget using cascaded multimodal models, and co-designing them with the target hardware platforms. We will present our early work to demonstrate the benefits and potential of such a co-design approach and discuss open research areas that are promising for the research community to explore.

Apr 27, 2021 11:00 AM — 12:00 PM
Meng Li
Meng Li
Staff Research Scientist

I am currently a staff research scientist and tech lead in the Meta On-Device AI team with a focus on researching and productizing efficient AI algorithms and hardwares for next generation AR/VR devices. I received my Ph.D. degree in the Department of Electrical and Computer Engineering, University of Texas at Austin under the supervision of Prof. David Z. Pan and my bachelor degree in Peking University under the supervision of Prof. Ru Huang and Prof. Runsheng Wang. My research interests include efficient and secure AI algorithms and systems.

var dimensionValue = 'SOME_DIMENSION_VALUE'; ga('set', 'dimension1', dimensionValue);