Page 196 - The-5th-MCAIT2021-eProceeding
P. 196

Principal Component Analysis Variant Initialization in
                               Convolutional Neural Network


                                                        a
                                                                           b
                             Nor Sakinah Md Othman *, Azizi Abdullah
           a,b  Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan
                                       Malaysia, 43600 Bangi, Selangor, Malaysia
                                        *Email: p108639@siswa.ukm.edu.my


        Abstract

        In Convolutional Neural Network (CNN), there are various weight initialization strategies that have been proposed to handle
        overfitting and slow convergence. This paper proposed an alternative weight initialization technique that utilizes Gaussian
        Principal Component Analysis (GPCA) initialization and Generalized Gaussian Principal Component Analysis (G-GPCA)
        initialization on LeNet-5 and AlexNet. The set of overlapping Gaussian windows is used to generate GPCA filters that
        simulates the characteristics of orientation and texture receptive fields which will lead to better performance in extracting
        low level features. The proposed method is tested on five different datasets namely MNIST, CIFAR-10, SVHN, GTSRB
        dan Covid-19. The results show that PCA variant initialization (PCA, GPCA, G-GPCA) obtained consistent accuracy that
        can be translated to consistent performance on a variety of datasets.

        Keywords: Weight initialization; principal component analysis; convolutional neural network; image classification.


        1. Introduction

           Applications of convolutional neural network (CNN) on various domains such as object recognition have
        been increased significantly  due to greater computing power and  higher volume of training datasets. Even
        though  it  is  a  well-known  research  topic  and  various  works  have  been  introduced,  the  issue  on  obtaining
        consistent accuracy and faster convergence still persists. Several methods have been proposed to optimize the
        training process of CNNs and one of the approaches made by other researchers is by focusing on the weight
        initialization strategy (Koturwar & Merchant, 2017).
           Popular methods of setting weights in the convolutional layer are Xavier initialization (Glorot & Bengio,
        2010) and He initialization (He et al., 2015). In (Koturwar & Merchant, 2017), both methods are believed to be
        able to handle gradient diffusion problem and the dying neuron problem. Gradient diffusion can be defined as
        amplification or attenuation of gradient values throughout the backpropagation process which results to an event
        of exploding or vanishing of gradients (Sun, 2020). Utilizing the standard initialization has several downsides
        such as independent to data statistics (Koturwar & Merchant, 2017), prone to dying neuron problem (Lu et al.,
        2019) and are often produced in redundance (Luan et al., 2018). Thus, it motivates other different types of
        weight initialization techniques such as PCA initialization, Linear Discriminant Analysis (LDA) initialization
        (Alberti et al., 2017) and have shown promising results in image classification.
           In this paper, initialization technique using Principal Component Analysis (PCA) variants such as PCA,
        Gaussian PCA (GPCA) and Generalized GPCA (G-GPCA) is introduced. LeNet-5 and AlexNet models are then
        used to investigate in CNN for image classification tasks. Several papers have introduced the usage of PCA
        filters on CNN, but there is still insufficient research conducted on GPCA and G-GPCA filters (Brause et al.,
        1999). In this method, PCA filters are generated by utilizing the image data statistics (Koturwar & Merchant,
        2017). It has some advantages such as ability to handle gradient diffusion problem (Ren et al., 2016) and provide
        robustness against image transformations (Soon et al., 2020). The contribution of this work is  to provide an





        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [183]
        Artificial Intelligence in the 4th Industrial Revolution
   191   192   193   194   195   196   197   198   199   200   201