Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Algorithms  /  Vol: 16 Par: 1 (2023)  /  Artículo
ARTÍCULO
TITULO

Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression

Wenxin Yang    
Xiaoli Zhi and Weiqin Tong    

Resumen

Current edge devices for neural networks such as FPGA, CPLD, and ASIC can support low bit-width computing to improve the execution latency and energy efficiency, but traditional linear quantization can only maintain the inference accuracy of neural networks at a bit-width above 6 bits. Different from previous studies that address this problem by clipping the outliers, this paper proposes a two-stage quantization method. Before converting the weights into fixed-point numbers, this paper first prunes the network by unstructured pruning and then uses the K-means algorithm to cluster the weights in advance to protect the distribution of the weights. To solve the instability problem of the K-means results, the PSO (particle swarm optimization) algorithm is exploited to obtain the initial cluster centroids. The experimental results on baseline deep networks such as ResNet-50, Inception-v3, and DenseNet-121 show the proposed optimized quantization method can generate a 5-bit network with an accuracy loss of less than 5% and a 4-bit network with only 10% accuracy loss as compared to 8-bit quantization. By quantization and pruning, this method reduces the model bit-width from 32 to 4 and the number of neurons by 80%. Additionally, it can be easily integrated into frameworks such as TensorRt and TensorFlow-Lite for low bit-width network quantization.

 Artículos similares

       
 
Pablo Brusola, Sergio Garcia-Nieto, Jose Vicente Salcedo, Miguel Martinez and Robert H. Bishop    
This paper presents a mathematical modeling approach utilizing a fuzzy modeling framework for fixed-wing aircraft systems with the goal of creating a highly desirable mathematical representation for model-based control design applications. The starting p... ver más
Revista: Aerospace

 
Wen Gao, Yanqiang Bi, Xiyuan Li, Apeng Dong, Jing Wang and Xiaoning Yang    
Hybrid airships, combining aerodynamic lift and buoyant lift, are efficient near-space aircraft for scientific exploration, observation, and surveillance. Compared to conventional airplanes and airships, hybrid airships offer unique advantages, including... ver más
Revista: Aerospace

 
Jiacun Wang, Guipeng Xi, Xiwang Guo, Shujin Qin and Henry Han    
The scheduling of disassembly lines is of great importance to achieve optimized productivity. In this paper, we address the Hybrid Disassembly Line Balancing Problem that combines linear disassembly lines and U-shaped disassembly lines, considering multi... ver más
Revista: Information

 
Giacomo Bergami    
Recent findings demonstrate how database technology enhances the computation of formal verification tasks expressible in linear time logic for finite traces (LTLf). Human-readable declarative languages also help the common practitioner to express tempora... ver más
Revista: Information

 
Xiaobin Qian, Helong Shen, Yong Yin and Dongdong Guo    
In this paper, we present a novel nonlinear model predictive control (NMPC) algorithm based on the Laguerre function for dynamic positioning ships to solve the problems of input saturation, unknown time-varying disturbances, and heavy computation. The no... ver más