Page 168 - The-5th-MCAIT2021-eProceeding
P. 168
of parallelism or partition size during execution. They performed profiling activity to describe the application
behavior as a function of the number of machines used in order to derive the dynamic partition solutions. Based
on the results, the time performance improved to 50% using the estimated dynamic partition. However, to obtain
accurate profiles with only a few test runs is a challenging task.
In summary, suboptimal partitioning can cause resource wastage. Determining the right size is crucial to the
scheduler as it will reduce the incoming overhead. Although repartitioning can be done to solve the bottleneck,
the process can be costly as it will involve reshuffling the data. It is something that needs to be avoided as data
will continue to rise at an unprecedented level. The natural way to solve this is by using sampling data or
application profiling. However, this can be particularly challenging tasks due to inaccurate sampling results and
profiling. Thus, this constitutes a direction for future work on how to achieve the best solution in partitioning.
Table 1. Summary of scheduling performance issues as well as the solutions available in the literature
Author (Year) Issue Solution Approach Methodology Limitation
Li et al. (2020) Parameter Machine learning Applied GAN algorithm to reduce Model accuracy need to be
Configuratio complexity by using less training data improved
n and inplement RPG to produce random
configurations
Bao et Parameter Machine learning Constructed testbeds that used sampling The speedup improvement is
al.(2019) Configuratio strategy (LHS) to generate more only 6-24% when compared to
n samples to train the model. other tuning methods
Gu et al. (2018) Parameter Machine learning Implement Neural Network to predict Only support single job to
Configuratio changes in parameter configurations optimise at one time
n
Nguyen et al. Parameter Machine learning Applied the LHS technique to minimize Need to generate more samples
(2018) Configuratio the number of training samples and use of training data to achieve the
n recursive random search algorithm optimal setting
Gounaris and Parameter Experiment-based Conducted repeated experiments guided Time consuming and requires
Torres (2018) Configuratio by a systematic methodology. expert knowledge
n
Perez et al. Parameter Fuzzy Utilized a metric called bottleneck score Slower convergence rate
(2018) Configuratio with multiple fuzzy engines and a
n parameter ensembel table
Petriditis et al. Parameter Trial and error Conducted a series of experiments for Time consuming and requires
(2017) Configuratio all the possible combinations of expert knowledge
n parameters
Wang et al. Parameter Machine learning Binary classification and multi- Requires intensive training for
(2017) Configuratio classification every specific workload
n
Bian et al. Parameter Simulation Created a simulator for Spark Challenging to simulate the
(2014) Configuratio environment to test various parameter real environment
n configuration
Zaouk et al. Workload Machine learning Used deep neural networks to develop The optimizer’s
(2021) characteristic performance prediction model by recommendation is too
embedding the workload characteristics optimistic due to extrapolation
in a sparse search space.
Used reinforcement learning (RL) and
Mao et al. Workload Machine learning neural networks to learn workload- Does not support multi-tenancy
(2019) characteristic framework
specific scheduling algorithms
Focus on managing the
Use metric of data expansion ratio to the
Liang et al. Workload WSMC input data for the workload memory space more efficiently,
(2018) characteristic classification rather than managing the
executor
Repartitioning data can be very
Wang et al. Predicting the optimal number of tasks expensive as it requires to
(2019) Partition size Machine learning per executors and tasks per machines reshuffled the data
Predict the possible straggler tasks
Hernandez et Boosted distribution by running with a fraction Does not support
al. (2018) Partition size Regression Tree of input data. heterogeneous machines
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [155]
Artificial Intelligence in the 4th Industrial Revolution