Redundancy in computing operations and weight parameters of deep neural network models will form a bottleneck of computing and memory resources for on-chip system implementation, meanwhile, cause considerable energy consumption. To solve the problem, DeePhi Tech has developed Deep Compression technology. The technique consists of pruning and quantization methods, and can compress models by cutting down the number of model weight connections and reducing the bitwidth of operands. This technique is also compatible with most main-stream neural networks, and can compress model size to 5x-100x, running time to 1.5-10x without losing models accuracy.
DNNDK™ (Deep Neural Network Development Kit) - DeePhi™ deep learning SDK, is designed as an integrated framework, which aims to simplify & accelerate DL (Deep Learning) applications’ development and deployment on DeePhi DPU™ (Deep Learning Processing Unit) platform. It makes the computing power of DPU become easily accessible through providing an innovative & productive solution, covering the phases of compression, programming, compilation and runtime enablement.View Details
DeePhi's Aristotle Architecture is designed for Convolutional Neural Networks (CNN).
While currently used for video and image recognition tasks, the architecture is flexible and scalable for both servers and portable devices.
Descartes Architecture is designed for compressed Recurrent Neural Networks (RNN) including LSTM, which could provide efficient hardware acceleration for the inference of sparse neural networks after deep compression. Users are able to achieve lower latency and power consumption with the help of Descartes architecture compared to CPU and GPU, for example, more than 2X speedup could be achieved compared to GPU (Tesla P4) when the model is pruned to around 20%, together with the performance of 2.5TOPS. Descartes architecture could support varieties of deep learning applications such as automatic speech recognition and natural language processing.
DDESE is an efficient end-to-end automatic speech recognition (ASR) engine based on the FPGA of Xilinx, which is designed for Deep Neural Networks (especially for LSTM), with the deep learning acceleration solution of algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference) by DeePhi. After pruning, the model is pruned to a sparse one (15%~20% density) with little loss of accuracy, then the weights and activations are quantized to 16bits so that the whole model is compressed by more than 10X and could be easily compiled by CSC (Compressed Sparse Column) format and deployed on the Descartes platform for efficient inference with the help of FPGA.View Details