Hi, I am a PhD student at the School of Data Science, the Chinese University of Hong Kong, Shenzhen, supervised by Prof. Haizhou Li. Prior to that, I received my bachelor’s degree from the Southern University of Science and Technology, supervised by Prof. Tom Ko. My research interests include automatic speech recognition, speech pre-training and large language models. I have published several papers at the top international AI conferences and journals such as TASLP, NeurIPS, ACL, and ICASSP.

📖 Educations

2022.09 - now, Ph.D., the Chinese University of Hong Kong, Shenzhen.
2024.01 - 2024.12, Visiting Student, National University of Singapore.
2016.09 - 2020.06, B.Eng, Southern University of Science and Technology.
2018.09 - 2019.05, Visiting Student, the University of Edinburgh.

💻 Internships

2025.05 - now, Research Scientist Intern, Meta GenAI.
2024.03 - 2025.05, Research Intern, Bytedance, Mentored by Prof. Zhizheng Wu and Dr. Xiaohai Tian.
2022.06 - 2022.12, Research Intern, Bytedance, Mentored by Prof. Tom Ko.
2021.06 - 2022.04, Research Intern, MSRA NLC group, Beijing, Mentored by Dr. Long Zhou and Dr. Shujie Liu.
2019.06 - 2019.08, Machine Learning Intern, Tencent, Shenzhen.

📝 Publications

USED: Universal Speaker Extraction and Diarization, Junyi Ao, Mehmet Sinan Yıldırım, Ruijie Tao, Meng Ge, Shuai Wang, Yanmin Qian, Haizhou Li, TASLP 2024
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words, Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu, NeurIPS 2024 |
Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks, Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li, IEEE Signal Processing Letters 2024
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech, Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li , INTERSPEECH 2024
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning, Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li, INTERSPEECH 2023 |
Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder, Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li, INTERSPEECH 2023
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li, ICASSP 2023
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data, Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei, INTERSPEECH 2022 |
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing, Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei, ACL 2022 |
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training, Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei, EMNLP 2022 |
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT, Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li, INTERSPEECH 2022 |
The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task, Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, Jinyu Li, ACL@IWSLT 2022 |
Multi-View Self-Attention Based Transformer for Speaker Recognition, Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang, ICASSP 2022
Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network, Junyi Ao, Tom Ko, ISCSLP 2021

📜 Preprints

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context, Junyi Ao, Dekun Chen, Xiaohai Tian, Wenjie Feng, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu, arXiv preprint arXiv:2503.15338
Overview of the Amphion Toolkit (v0.2), Jiaqi Li, Xueyao Zhang, Yuancheng Wang, Haorui He, Chaoren Wang, Li Wang, Huan Liao, Junyi Ao, Zeyu Xie, Yiqiao Huang, Junan Zhang, Zhizheng Wu, arXiv preprint arXiv:2501.15442

🎖 Others

Reviewer

IEEE Transactions on Multimedia (TMM)
The International Conference on Learning Representations (ICLR)
The Annual Meeting of the Association for Computational Linguistics (ACL)
IEEE Signal Processing Letters (SPL)
Computer Speech and Language
The International Conference on Acoustics, Speech and Signal Processing (ICASSP)
INTERSPEECH
International Joint Conference on Neural Networks (IJCNN)
National Conference on Man-Machine Speech Communication (NCMMSC)

Teaching

Leading TA, DDA3020 Machine Learning, Spring 2023
TA, CSC3100 Data Structures, Fall 2022

Junyi Ao (敖君逸)

📖 Educations

💻 Internships

📝 Publications

📜 Preprints

🎖 Others