Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Pal, Chandrajit and Pankaj, Sunil and Akram, Wasim and Biswas, Dwaipayan and Mattela, Govardhan and Acharyya, Amit (2022) Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices. Circuits, Systems, and Signal Processing, 41 (7). pp. 3957-3984. ISSN 0278-081X

Full text not available from this repository. (Request a copy)

Abstract

In this paper, we introduce a fragmented Huffman compression methodology for compressing convolution neural networks executing on edge devices. Present scenario demands deployment of deep networks on edge devices, since application needs to adhere to low latency, enhanced security and long-term cost effectiveness. However, the primary bottleneck lies in the expanded memory footprint on account of the large size of the neural net models. Existing software implementation of deep compression strategies do exist, where Huffman compression is applied on the quantized weights, reducing the deep neural network model size. However, there is a further possibility of compression in memory footprint from a hardware design perspective in edge devices, where our proposed methodology can be complementary to the existing strategies. With this motivation, we proposed a fragmented Huffman coding methodology, that can be applied to the binary equivalent of the numeric weights of a neural network model stored in device memory. Subsequently, we also introduced the static and dynamic storage methodology on device memory space which is left behind even after storing the compressed file, that led to a big reduction in area and energy consumption of approximately 38% in case of dynamic storage methodology in comparison with static one. To the best of our knowledge, this is the first study where Huffman compression technique has been revisited by applying it to compress binary files, from a hardware design perspective, based on multiple bit pattern sequences, to achieve a maximum compression rate of 64%. A compressed hardware memory architecture and a decompression module design has also been undertaken, being synthesized at 500 MHz, using GF 40-nm low-power cell library with a nominal voltage of 1.1 V achieving a reduction of 62% dynamic power consumption with a decompression time of about 63 microseconds (μ s) without trading-off accuracy. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

[error in script]

IITH Creators:

IITH Creators	ORCiD
Acharyya, Amit	http://orcid.org/0000-0002-5636-0676

Item Type:

Article

Additional Information:

Authors would like to acknowledge the support extended by the Defence Research and Development Organization, Ministry of Defence, Government of India with the Grant reference: ERIPR/ER/202009001/M/01/1781 dated 8 February 2021 for the research project entitled “Reconfigurable Machine Learning Accelerator Design and Development for Avionics Applications.” Authors would also like to acknowledge the support received by the Ministry of the Electronics and Information Technology (MEITY), Government of India toward the usage of the CAD tools as part of the Special Manpower Development (SMDP) program. The authors would also like to thank Ceremorphic Technologies Private Limited for funding and extending the tool support for carrying out few experiments.

Uncontrolled Keywords:

Common objects in context (COCO); Convolution neural network (CNN); Deep neural network (DNN); Fragmented Huffman encoding (FHE); Hierarchical data format (HDF5); Residual network (ResNet)

Subjects:

Electrical Engineering

Divisions:

Department of Electrical Engineering

Depositing User:

. LibTrainee 2021

Date Deposited:

25 Jul 2022 12:06

Last Modified:

25 Jul 2022 12:06

URI:

http://raiithold.iith.ac.in/id/eprint/9920