Keynote: Efficient Audio-Visual Understanding on AR Devices