Page 168 - The-5th-MCAIT2021-eProceeding
P. 168

of parallelism or partition size during execution. They performed profiling activity to describe the application
        behavior as a function of the number of machines used in order to derive the dynamic partition solutions. Based
        on the results, the time performance improved to 50% using the estimated dynamic partition. However, to obtain
        accurate profiles with only a few test runs is a challenging task.
           In summary, suboptimal partitioning can cause resource wastage. Determining the right size is crucial to the
        scheduler as it will reduce the incoming overhead. Although repartitioning can be done to solve the bottleneck,
        the process can be costly as it will involve reshuffling the data. It is something that needs to be avoided as data
        will continue to rise at an unprecedented level. The natural way to solve this is by using sampling data or
        application profiling. However, this can be particularly challenging tasks due to inaccurate sampling results and
        profiling. Thus, this constitutes a direction for future work on how to achieve the best solution in partitioning.

        Table 1. Summary of scheduling performance issues as well as the solutions available in the literature
         Author (Year)   Issue   Solution Approach   Methodology            Limitation
         Li et al. (2020)   Parameter   Machine learning    Applied GAN algorithm to reduce   Model accuracy need to be
                      Configuratio             complexity by using less training data   improved
                      n                        and inplement RPG  to produce random
         Bao et       Parameter   Machine learning    Constructed testbeds that used sampling   The speedup improvement is
         al.(2019)    Configuratio             strategy (LHS) to generate more   only 6-24% when compared to
                      n                        samples to train the model.   other tuning methods
         Gu et al. (2018)   Parameter   Machine learning    Implement Neural Network to predict   Only support single job to
                      Configuratio             changes in parameter configurations   optimise at one time
         Nguyen et al.   Parameter   Machine learning    Applied the LHS technique to minimize   Need to generate more samples
         (2018)       Configuratio             the number of training samples and use   of training data to achieve the
                      n                        recursive random search algorithm   optimal setting
         Gounaris and   Parameter   Experiment-based   Conducted repeated experiments guided   Time consuming and requires
         Torres (2018)   Configuratio          by a systematic methodology.   expert knowledge
         Perez et al.   Parameter   Fuzzy      Utilized a metric called bottleneck score   Slower convergence rate
         (2018)       Configuratio             with multiple fuzzy engines and a
                      n                        parameter ensembel table
         Petriditis et al.   Parameter   Trial and error   Conducted a series of experiments for   Time consuming and requires
         (2017)       Configuratio             all the possible combinations of   expert knowledge
                      n                        parameters
         Wang et al.   Parameter   Machine learning   Binary classification and multi-  Requires intensive training for
         (2017)       Configuratio             classification               every specific workload
         Bian et al.   Parameter   Simulation   Created a simulator for Spark   Challenging to simulate the
         (2014)       Configuratio             environment to test various parameter   real environment
                      n                        configuration
         Zaouk et al.   Workload   Machine learning   Used deep neural networks to develop   The optimizer’s
         (2021)       characteristic           performance prediction model by   recommendation is too
                                               embedding the workload characteristics   optimistic due to extrapolation
                                                                            in a sparse search space.
                                               Used reinforcement learning (RL) and
         Mao et al.   Workload   Machine learning   neural networks to learn workload-  Does not support multi-tenancy
         (2019)       characteristic                                        framework
                                               specific scheduling algorithms
                                                                            Focus on managing the
                                               Use metric of data expansion ratio to the
         Liang et al.   Workload   WSMC        input data for the workload   memory space more efficiently,
         (2018)       characteristic           classification               rather than managing the
                                                                            Repartitioning data can be very
         Wang et al.                           Predicting the optimal number of tasks   expensive as it requires to
         (2019)       Partition size   Machine learning   per executors and tasks per machines   reshuffled the data
                                               Predict  the possible straggler tasks
         Hernandez et            Boosted       distribution by running with a fraction   Does not support
         al. (2018)   Partition size   Regression Tree   of input data.     heterogeneous machines

        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [155]
        Artificial Intelligence in the 4th Industrial Revolution
   163   164   165   166   167   168   169   170   171   172   173