MSc thesis project proposal

[2024] An FPGA Accelerator of OpenAI Whisper Speech Recognizer

Introduction

In the evolving landscape of human-machine interaction, Automatic Speech Recognition (ASR) systems play a pivotal role by enabling machines to understand and process human speech. The emergence of AI-driven ASR technologies, such as OpenAI's Whisper, has significantly advanced the field, offering high accuracy and versatility across various languages and contexts. However, the real-time application of such systems is often hindered by computational latency. This project proposes the development of a hardware accelerator for the Whisper ASR system using the Zynq Field-Programmable Gate Array (FPGA) platform, with an ambitious goal of achieving a target latency of 10 milliseconds to ensure seamless real-time speech recognition.

Project Aims and Objectives

The primary aim of this thesis is to design and implement a hardware accelerator for the Whisper speech recognition system, optimizing for speed and efficiency. The objectives are as follows:

  1. Performance Optimization: Achieving a target latency of 10 milliseconds for speech recognition to enable real-time processing.
  2. Hardware Design: Utilizing the Verilog hardware description language to develop an FPGA-based accelerator tailored for the Whisper ASR system.
  3. Evaluation and Benchmarking: Assessing the accelerator's performance regarding latency in milliseconds, throughput in operations per second (OPS), accuracy in word error rate (WER), and power consumption compared to software-based GPU implementations.

Assignment

Assignment: Research and Methodology

The project will involve:

  • Literature Review: Conducting an in-depth review of existing ASR systems, with a focus on hardware accelerators and FPGA implementations.
  • Hardware Design: Designing the FPGA accelerator using Verilog, considering the unique computational requirements of the Whisper ASR system.
  • System Integration and Testing: Integrating the hardware accelerator with the Whisper system and conducting tests to evaluate performance metrics.
  • Performance Analysis: Comparing the FPGA accelerator's performance with traditional software implementations, analyzing improvements in latency, accuracy, and efficiency.

Expected Outcomes

This thesis is expected to deliver:

  • A fully functional FPGA-based hardware accelerator for the Whisper ASR system, achieving real-time speech recognition with a target latency of 10 milliseconds.
  • A detailed analysis of the system's performance, including comparisons with software-based ASR systems to highlight improvements.

Requirements

Requirements

  • Digital signal processing and preferably machine learning.
  • Experience with FPGA development, particularly using Verilog for hardware description.

Contact

dr. Chang Gao

Electronic Circuits and Architectures Group

Department of Microelectronics

Last modified: 2024-02-20