Page 166 - The-5th-MCAIT2021-eProceeding
P. 166

applied the LHS technique to minimize the number of training samples and use a recursive random search
        algorithm to tune the parameter configurations. The results demonstrated that their proposed method reduces
        the execution time by 22.8% to 40% on 9 different applications compared to the default settings.
           In Wang et al. (2017), the idea is to used binary and multi-class classification algorithms to predict the
        execution time under a given set of parameters. Data from actual executions for each workload is collected
        using random sampling to train the model. Their proposed method improved the time performance of an average
        of 36% lower running times when compared to the default settings. However, this technique needs to have
        intensive  training  for  every  specific  workload  to  achieve  the  optimal  model.  A  study  by  Gu  et  al.  (2018)
        proposed  tuning  Spark  parameter  configurations  for  streaming  applications  using  neural  networks.  They
        generated training data set randomly and used a random forest algorithm to build a prediction model to predict
        the  execution  time.  On  the  other  hand,  the  neural  network  approach  is  used  to  search  for  the  optimal
        configuration  based  on  the  prediction  model.  The  experimental  results  show  that  the  proposed  approach
        increases  the  performance  of  Spark  streaming  to  42.8%  when  compared  with  the  default  parameter
        configuration. Recent study by Li et al. (2020) proposed the ATCS system, an automated tuning approach using
        the Generative Adversarial Network (GAN) algorithm. The GAN algorithm is used to build a performance
        prediction model by reducing the model's complexity using less training data. They implemented a Random
        Parameter  Generator  (RPG)  to  produce  random  configurations  for  each  workload  as  training  data  for  the
        prediction model. The results show that Spark's performance can be improved by an average of 3.5 to 6.9 times
        compared to the performance of the default parameters.
           Based on the previous works above, we can conclude that the experimental-based or trial and error approach
        is less effective due to high-dimensional parameter space and time consuming as it requires intensive repetition
        of experiments to test each combination of parameters. On the other hand, the simulation approach is a faster
        way to test all the parameters combination. However, it is challenging to simulate the real Spark environment
        as it requires depth knowledge of Spark internal systems to build one. Machine learning methods are gaining
        much popularity among researchers to facilitate better results in getting optimal configurations. More efficient
        machine learning methods should be explored to tackle this issue. Focusing not only on the improvement of the
        prediction accuracy and time performance, but it is also important to have estimation prediction of the usages
        of cores, memory, disk, and network before launching the application to ensure that all the resources are fully
        utilized.

        2.2 Workload Characteristics
           Workload characteristics refer to the job characteristics, i.e., the size of data, type of data (e.g. SQL or
        machine learning tasks, etc) or the resource requirements needed to run the data. Default Spark schedulers such
        as  FIFO  does  not  consider  the  workload  characteristics  in  the  scheduling  decision  to  have  an  easy  and
        straightforward implementation. By using this approach, it is hard to achieve efficient utilization of resources.
        In this section, we examine different methods and techniques to enhance Spark scheduling performance based
        on workload characteristics-aware.
           Mao et al. (2019) proposed Decima that aims to improve the existing heuristic approach of task scheduling
        by considering the workload characteristics. It uses reinforcement learning (RL) and neural networks to learn
        scheduling policy through experience. It represents the scheduler as an agent that can learn from workload and
        cluster  conditions  without  relying  on  incorrect  assumptions.  Decima  encodes  its  scheduling  strategy  by
        observing the environment, taking action and improving its policy over time to make better decisions. The agent
        will be rewarded after taking any action, and the reward is set based on the scheduling objective (e.g., minimize
        average job completion time). The results show Decima improves average job completion time by 21% over
        default schedulers. However, the authors do not mention whether it supports for multi-tenancy framework,
        which is important for high-performance computing workload.








        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [153]
        Artificial Intelligence in the 4th Industrial Revolution
   161   162   163   164   165   166   167   168   169   170   171