Page 165 - The-5th-MCAIT2021-eProceeding
P. 165
2. Job Scheduling Performance Issues and Solutions in Spark
The goal of job scheduling in Spark resource management is to plan the execution of tasks throughout the
nodes. It aims to maximize resource utilization while minimizing the total execution time. This section will
elaborate on the performance challenges or issues in Spark job scheduling and solutions available in the
literature. We classify these issues into three categories, as shown in Table 1. All the categories can be described
as follows:
2.1 Parameter Configuration
Parameter configuration refers to the setting of Spark parameter values before executing an application
(Zaharia et al., 2010). Typically, the configuration of the Spark parameters can be done in 2 ways. One, the user
can manually set the configuration, and two, they can use the default configuration to have an easy
implementation. One of the issues found here is that it can cause slowdowns or even worst failures in the Spark
applications if its parameters are not properly configured. Therefore, given the high significance of the problem,
many previous efforts have been made to determine the optimal solutions for parameter configuration.
Among these, Petridis et al. (2017) applied manual tuning by trial and error to tune Spark configuration
parameters. They conducted a series of experiments for all the possible combinations of parameters by utilizing
expert knowledge to search for an optimal configuration. The results showed that their manually tuned method
could increase Spark performance by 10 times speedup. Gounaris and Torres (2018), on the other hand provide
an alternative approach to Petriditis et al. (2017), where they proposed a systematic methodology for parameter
tuning. However, this study also involves with repeated experiments of a trial and error approach, but it is
guided by a systematic methodology. The results of this study reveal that the proposed methodology improves
the speed by up to 20 % during implementation compared to the default settings. However, both of these studies
clearly is time-consuming as it needed considerable effort in performing repeated experiments to find the best
parameter configurations. Furthermore, it requires expert knowledge and researcher experience to determine
the value of the parameters at the beginning of the phase.
Study by Bian et al. (2014), proposed CSMethod, a simulator for Spark where the whole Spark application
execution environment is simulated. This paper aims to provide a fast and accurate simulator as well as
providing a reliable approach for testing parameter combinations until the optimal setting is met. This approach,
however, seems rather challenging to precisely simulate the environment due to the vast hardware diversity and
software complexity. Moreover, when applied to the actual cluster, there might be an inconsistency of getting
the expected results due to a different implementation environment. Other techniques by Perez et al. (2018)
developed a multi-parameter tuning method called PETS (Parameter Ensemble Table for Spark) using a Fuzzy
approach. It utilized a metric called bottleneck score with multiple fuzzy engines and a parameter ensemble
table. Most of the rules and fuzzy classes require knowledge from researchers or experts. PETS is able to tune
18 parameters simultaneously and outperformed other machine learning techniques with a speedup of up to
x4.78 using 6 different workloads of the Hibench benchmark. However, there is a trade-off between
performance speedup and convergence speed. Achieving a higher speedup resulted in slower convergence when
compared to simple strategy due to high rates of changing the parameter at one time.
A more popular method uses machine learning-based approach by building models and making predictions
on the performance before the application started. Previous efforts by Bao et al. (2019) proposed an automatic
parameter tuning called Autotune. The researchers implement testbeds that use a sampling strategy called Latin
Hypercube Sampling (LHS) to generate more samples based on given time constraints to train the model.
Therefore, more promising configurations can be found using the trained prediction model. Autotune
demonstrated that it improves execution time to 63.7% on average when compared to default parameter
configuration. However, when compared to other tuning methods, the speedup improvement is only 6-24%.
Other research that also utilized the LHS sampling strategy is Nguyen et al. (2018). Unlike in Autotune, they
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [152]
Artificial Intelligence in the 4th Industrial Revolution