Google AI Weblog: Google at Interspeech 2022


    This week, the twenty third Annual Convention of the Worldwide Speech Communication Affiliation (INTERSPEECH 2022) is being held in Incheon, South Korea, representing one of many world’s most in depth conferences on analysis and know-how of spoken language understanding and processing. Over 2,000 specialists in speech-related analysis fields collect to participate in oral displays and poster classes and to collaborate with streamed occasions throughout the globe.

    We’re excited to be a Diamond Sponsor of INTERSPEECH 2022, the place we will probably be showcasing almost 50 analysis publications and supporting quite a few workshops, particular classes and tutorials. We welcome in-person attendees to drop by the Google sales space to satisfy our researchers and take part in Q&As and demonstrations of a few of our newest speech applied sciences, which assist to enhance accessibility and supply comfort in communication for billions of customers. As well as, on-line attendees are inspired to go to our digital sales space in GatherTown the place you will get up-to-date data on analysis and alternatives at Google. You may as well study extra concerning the Google analysis being introduced at INTERSPEECH 2022 beneath (Google affiliations in daring).

    Organizing Committee

    Business Liaisons embody: Bhuvana Ramabahdran

    Space Chairs embody: John Hershey, Heiga Zen, Shrikanth Narayanan, Bastiaan Kleijn

    ISCA Fellows

    Embody: Tara Sainath, Heiga Zen


    Manufacturing Federated Key phrase Recognizing by way of Distillation, Filtering, and Joint Federated-Centralized Coaching

    Andrew Arduous, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Leveraging Unsupervised and Weakly-Supervised Information to Enhance Direct Speech-to-Speech Translation

    Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobu Morioka

    Sentence-Choose: Massive-Scale Language Mannequin Information Choice for Uncommon-Phrase Speech Recognition

    W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor Strohman, Shankar Kumar

    UserLibri: A Dataset for ASR Personalization Utilizing Solely Textual content

    Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

    SNRi Goal Coaching for Joint Speech Enhancement and Recognition

    Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

    Flip-Taking Prediction for Pure Conversational Speech

    Shuo-Yiin Chang, Bo Li, Tara Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He

    Streaming Supposed Question Detection Utilizing E2E Modeling for Continued Dialog

    Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

    Enhancing Distortion Robustness of Self-Supervised Speech Processing Duties with Area Adaptation

    Kuan Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-yi Lee

    XLS-R: Self-Supervised Cross-Lingual Speech Illustration Studying at Scale

    Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

    Extracting Focused Coaching Information from ASR Fashions, and Tips on how to Mitigate It

    Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

    Detecting Unintended Memorization in Language-Mannequin-Fused ASR

    W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

    AVATAR: Unconstrained Audiovisual Speech Recognition

    Valentin Gabeur, Paul Hongsuck Web optimization, Arsha Nagrani, Chen Solar, Karteek Alahari, Cordelia Schmid

    Finish-to-Finish Multi-talker Audio-Visible ASR Utilizing an Lively Speaker Consideration Module

    Richard Rose, Olivier Siohan

    Transformer-Primarily based Video Entrance-Ends for Audio-Visible Speech Recognition for Single and Multi-person Video

    Dmitriy Serdyuk, Otavio Braga, Olivier Siohan

    Unsupervised Information Choice by way of Discrete Speech Illustration for ASR

    Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani

    Non-parallel Voice Conversion for ASR Augmentation

    Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno

    Extremely-Low-Bitrate Speech Coding with Pre-trained Transformers

    Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund

    Streaming Finish-to-Finish Multilingual Speech Recognition with Joint Language Identification

    Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani

    Enhancing Deliberation by Textual content-Solely and Semi-supervised Coaching

    Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang

    E2E Segmenter: Joint Segmenting and Decoding for Lengthy-Type ASR

    W. Ronny Huang, Shuo-yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu

    CycleGAN-Primarily based Unpaired Speech Dereverberation

    Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

    TRILLsson: Distilled Common Paralinguistic Speech Representations (see weblog publish)

    Joel Shor, Subhashini Venugopalan

    Studying Neural Audio Options With out Supervision

    Sarthak Yadav, Neil Zeghidour

    SpeechPainter: Textual content-Conditioned Speech Inpainting

    Zalan Borsos, Matthew Sharifi, Marco Tagliasacchi

    SpecGrad: Diffusion Probabilistic Mannequin-Primarily based Neural Vocoder with Adaptive Noise Spectral Shaping

    Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani

    Distance-Primarily based Sound Separation

    Katharine Patterson, Kevin Wilson, Scott Knowledge, John R. Hershey

    Evaluation of Self-Consideration Head Range for Conformer-Primarily based Automated Speech Recognition

    Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

    Enhancing Uncommon Phrase Recognition with LM-Conscious MWER Coaching

    Wang Weiran, Tongzhou Chen, Tara Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

    MAESTRO: Matched Speech Textual content Representations By Modality Matching

    Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen

    Pseudo Label is Higher Than Human Label

    Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman

    On the Optimum Interpolation Weights for Hybrid Autoregressive Transducer Mannequin

    Ehsan Variani, Michael Riley, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran

    Streaming Align-Refine for Non-autoregressive Deliberation

    Wang Weiran, Ke Hu, Tara Sainath

    Federated Pruning: Enhancing Neural Community Effectivity with Federated Studying

    Rongmei Lin*, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Fran
    çoise Beaufays

    A Unified Cascaded Encoder ASR Mannequin for Dynamic Mannequin Sizes

    Shaojin Ding, Weiran Wang, Ding Zhao, Tara N Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman

    4-Bit Conformer with Native Quantization Conscious Coaching for Speech Recognition

    Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov

    Visually-Conscious Acoustic Occasion Detection Utilizing Heterogeneous Graphs

    Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha

    A Conformer-Primarily based Waveform-Area Neural Acoustic Echo Canceller Optimized for ASR Accuracy

    Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein

    Decreasing Area Mismatch in Self-Supervised Speech Pre-training

    Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Nicolás Serrano

    On-the-Fly ASR Corrections with Audio Exemplars

    Golan Pundak, Tsendsuren Munkhdalai, Khe Chai Sim

    A Language Agnostic Multilingual Streaming On-System ASR System

    Bo Li, Tara Sainath, Ruoming Pang*, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani

    XTREME-S: Evaluating Cross-Lingual Speech Representations

    Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

    In the direction of Disentangled Speech Representations

    Cal Peyser, Ronny Huang, Andrew Rosenberg, Tara Sainath, Michael Picheny, Kyunghyun Cho

    Private VAD 2.0: Optimizing Private Voice Exercise Detection for On-System Speech Recognition

    Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O’Malley, Ian McGraw

    A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

    Tom O’Malley, Arun Narayanan, Quan Wang

    Coaching Textual content-To-Speech Methods From Artificial Information: A Sensible Strategy For Accent Switch Duties

    Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alex Petelin, Jonathan Shen*, Vincent Wan, Yu Zhang, Yonghui Wu, Robert Clark

    A Scalable Mannequin Specialization Framework for Coaching and Inference Utilizing Submodels and Its Software to Speech Mannequin Personalization

    Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro Moreno

    Textual content-Pushed Separation of Arbitrary Sounds

    Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Knowledge, Marco Tagliasacchi

    Workshops, Tutorials & Particular Periods

    The VoxCeleb Speaker Recognition Problem 2022 (VoxSRC-22)

    Organizers embody: Arsha Nagrani

    Self-Supervised Illustration Studying for Speech Processing

    Organizers embody: Tara Sainath

    Studying from Weak Labels

    Organizers embody: Ankit Shah

    RNN Transducers for Named Entity Recognition with Constraints on Alignment for Understanding Medical Conversations

    Authors: Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey

    Listening with Googlears: Low-Latency Neural Multiframe Beamforming and Equalization for Listening to Aids

    Authors: Samuel Yang, Scott Knowledge, Chet Gnegy, Richard F. Lyon, Sagar Savla

    Utilizing Rater and System Metadata to Clarify Variance within the VoiceMOS Problem 2022 Dataset

    Authors: Michael Chinen, Jan Skoglund, Chandan Ok. A. Reddy, Alessandro Ragano, Andrew Hines

    Incremental Layer-Clever Self-Supervised Studying for Environment friendly Unsupervised Speech Area Adaptation On System

    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

    Reliable Speech Processing

    Organizers embody: Shrikanth Narayanan

    *Work executed whereas at Google.  


    Please enter your comment!
    Please enter your name here