DDESE is an efficient end-to-end automatic speech recognition (ASR) engine with the deep learning acceleration solution of algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference) by DeePhi. We Use Baidu DeepSpeech2 framework with LibriSpeech 1000h dataset for model training and compression. Users could run the test scripts for both performance comparison of CPU/FPGA and single sentence recognition.
Innovative full-stack acceleration solution for deep learning in acoustic speech recognition (ESE: best paper of FPGA2017)
Our solution is algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference).
After pruning, the model is pruned to a sparse one (15%~20% density) with little loss of accuracy, then the weights and activations are quantized to 16bits so that the whole model is compressed by more than 10X and could be easily compiled by CSC (Compressed Sparse Column) format and deployed on the Descartes platform for efficient inference with the help of FPGA.
Our ASR system and model structure are as follows:
Our achievements of DDESE are as follows:
2.87X and 2.56X speedup could be achieved compared to GPU (Tesla P4 + cudnn) for unidirectional and bi-directional LSTM model respectively, if only considering LSTM layers.
2.06X speedup could be achieved compared to GPU (Tesla P4 + cudnn) for the whole end-to-end speech recognition process if considering both CNN and bi-directional LSTM layers for further acceleration.
The details of performance comparison for bi-directional LSTM model are as follows:
We assume you are familiar with AWS F1 instance. Please refer to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html if you are not. You should launch and login to DDESE instance before the test.
# sudo bash (make sure you are under root environment)
# source /opt/Xilinx/SDx/2017.1.rte/setup.sh(start SDAccel platform)
# cd ASR_Accelerator/deepspeech2 (where the test tool are placed)
# source activate test_py3 (activate python3.6 environment)
After the above steps are done, you are free to test the ASR process.
The following command deploys a model on CPU and transcribes the same sentence 1000 times.
# python aws_test.py --audio_path data/middle_audio/wav/middle1.wav --single_test
The following command deploys a model on FPGA and transcribes the same sentence 1000 times.
# python aws_test.py --fpga_config deephi/config/fpga_cnnblstm_0.15.json --audio_path data/middle_audio/wav/middle1.wav --no_cpu --single_test
With the help of these tests, you could compare the performance of the same automatic speech recognition task on CPU and FPGA.
In this part, we will detail more commands that you could use to test the DeePhi_ASRAcc. Furthermore, you can change some parameters according to the parameter descriptions.
# python aws_test.py (multi-sentence test to show the performance of FPGA over CPU)
By default, this command will deploy a model on CPU and transcribe all the sentences (“.wav” format) under data/short_audio/wav/ and print the output logs.
# python transcribe.py (single-sentence test to show the accuracy of the model)
By default, this command will deploy the model on CPU and transcribe data/short_audio/wav/short_audio1.wav and print the output logs.
By default both commands deploy model only CPU, you could add FPGA configuration to deploy the model on FPGA, like below:
# python aws_test.py --fpga_config deephi/config/fpga_bilstm_0.15.json
(deploy the model on both CPU and FPGA and run the test)
By running this command, models will be deployed on CPU AND FPGA and the ASR process will be tested on CPU and FPGA one by one.
# python transcribe.py --fpga_config deephi/config/fpga_bilstm_0.15.json
(deploy the model on FPGA and do the ASR)
By running this command, model will be deployed on FPGA INSTEAD of on CPU, together with the ASR process.
A. for command aws_test.py:
:set this parameter to avoid running the ASR process on CPU
:specify the ROOTDIR_OF_YOUR_WAV_FILE to the folder where wav files are saved, then this command will transcribe every .wav file under this folder,this parameter SHOULD NOT be used together with -- single_test parameter
:specify the PATH_TO_YOUR_WAV_FILE to the wav file that you want to transcribe, then this command will transcribe the specified sentence for 1000 times,this parameter SHOULD be used together with -- single_test parameter
:set this parameter to run single test mode, thus, transcribe the same sentence 1000 times on the specified models. Otherwise transcribe all the sentences under the specified folder for 1 time.
B. for command transcribe.py:
:specify the PATH_TO_YOUR_WAV_FILE to the wav file that you want to transcribe.
Note: The folder named “data” consist of short audios, middle audios and long audios.
Please upload your own wav file (must be 16kHz sample rate, recorded in clean environment, shorter than 3 seconds).Then use the following command to transcribe the uploaded sentence:
# python transcribe.py --audio_path PATH_TO_YOUR_WAV_FILE