Publication LLM can Read Spectrogram: Encoder-free Speech-Language Modeling Ruchao Fan, Yiming Wang, Yuxuan Hu, Bo Ren, Yufei Xia, Xiaofei Wang, Yao Qian, Jinyu Li June 2026 arXiv | June 2026
Publication Real-time Speech Restoration using Data Prediction Mean Flows Sebastian Braun May 2026 arXiv | May 2026
Publication A Comprehensive Ecosystem for Open-Domain Customized Video Generation Jingxu Zhang, Yuqian Hong, Daneul Kim, Kai Qiu, Qi Dai, Jianmin Bao, Yifan Yang, Xiaoyan Sun, Chong Luo ICASSP 2026 | May 2026
Publication Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy Shakeel A. Sheikh, Patrick Marmaroli, Md. Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W. Schuller May 2026 arXiv | May 2026
Publication Speech LLMs are Contextual Reasoning Transcribers Keqi Deng, Ruchao Fan, Bo Ren, Yiming Wang, Jinyu Li April 2026 arXiv | April 2026
Publication RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue Meng-Chen Lee, Costas Panay, Javier Hernandez, Sean Andrist, Dan Bohus, Anatoly Churikov, Andrew D. Wilson March 2026 Project
Publication Counting Without Numbers &Finding Without Words B. N. Patro March 2026 arXiv | March 2026
Publication Sirens’Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs Zijian Ling, Pingyi Hu, Xiuyong Gao, Xiaojing Ma, Man Zhou, Jun Feng, Songfeng Lu, Dongmei Zhang, Bin Benjamin Zhu March 2026 arXiv | March 2026
Publication Aurelius: Relation Aware Text-to-Audio Generation At Scale Yuhang He, He Liang, Yash Jain, Andrew Markham, Vibhav Vineet ICLR | February 2026
Publication VibeVoice: Expressive Podcast Generation with Next-Token Diffusion Zhiliang Peng, Jianwei Yu, Wenhui Wang, Yaoyao Chang, Yutao Sun, Li Dong, Yi Zhu, Weijiang Xu, Hangbo Bao, Zehua Wang, Shaohan Huang, Yan Xia, Furu Wei ICLR 2026 | February 2026