Mixed-Method Quantization of Convolutional Neural Networks on an Adaptive FPGA Accelerator


Madadum H., BECERİKLİ Y.

7th International Conference on Computer Science and Engineering, UBMK 2022, Diyarbakır, Türkiye, 14 - 16 Eylül 2022, ss.355-359 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk55850.2022.9919597
  • Basıldığı Şehir: Diyarbakır
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.355-359
  • Anahtar Kelimeler: Convolutional Neural Networks, FPGA, Partial reconfiguration, Quantization
  • Kocaeli Üniversitesi Adresli: Evet

Özet

© 2022 IEEE.Research on quantization of Convolutional Neural Networks (ConNN) has gained attention recently due to the increasing demand for using ConNN models on embedded devices. The quantization method involves compressing the ConNN model to simplify complex computing and reduce resource requirements. Despite this, simply mapping a 32-bit ConNN model to a lower bit can have a negative impact on accuracy. One of the limitations of the quantization is the diversity of parameters in the ConNN model. Different layers have different structures. Thus, using the same quantization method for all layers in the ConNN model can lead to sub-optimal performance. Therefore, we propose a mixed-method quantization, a compression technique that uses different quantization approaches for a single ConNN model. We also propose an adaptive accelerator for quantized ConNN where its architecture is reconfigured during runtime using partial reconfiguration capability. The experimental results show that the proposed design achieves accuracy close to 32-bit models when quantizing ConNN models to 4 bits without retraining. In addition, the adaptive accelerator can achieve the highest resource efficiency of 1.11 GOP/s/DSP and 1.49 GOP/s/kLUT.