Page 169 - The-5th-MCAIT2021-eProceeding
P. 169

Table 1. (Continued)

         Author (Year)   Issue   Solution Approach   Methodology            Limitation
                                               Partition the input data in a fine-grained   Time-consuming as the design
         Wang et al.             Simulated     way and assign number of threads in the   and implementation far more
         (2018)       Partition size   annealing   cluster with small scale data   complex
                                                                            Challenging tasks to obtain an
         Gounaris et al.         Greedy and    Performed profiling to modify the   accurate profiles with only few
         (2017)       Partition size   Randomized   partition size during execution   test runs

        3. Conclusion

           Job scheduling is the most crucial element in any data processing framework. It plays a vital role in achieving
        efficient utilization of resources. Existing scheduling solutions need to keep evolving as to properly support
        new challenges that keep arising. These necessities are important to facilitate a higher performance of data-
        intensive workload. In this paper, we present a review of Spark scheduling performance issues and compile the
        solutions  available  in  the  literature  accordingly.  We  provide  an  analysis  of  the  related  works  to  date  and
        suggestions for research directions. We hope that our effort provides an entry point for researchers to build a
        roadmap for future work to improve Spark scheduling performance.

        Acknowledgements

           This work is supported by the Ministry of Higher Education Malaysia under the Fundamental Research
        Grant Scheme (FRGS/1/2018/ICT02/UKM/02/6).
        References


        Bao, L., Liu, X., & Chen, W. (2019). Learning-based Automatic Parameter Tuning for Big Data Analytics
        Frameworks.  In  Proceedings  -  2018  IEEE  International  Conference  on  Big  Data,  Big  Data  2018.
        https://doi.org/10.1109/BigData.2018.8622018
        Bian, Z., Wang, K., Wang, Z., Munce, G., Cremer, I., Zhou, W., … Xu, G. (2014). Simulating big data clusters
        for system planning, evaluation, and optimization. In Proceedings of the International Conference on Parallel
        Processing. https://doi.org/10.1109/ICPP.2014.48
        Gounaris, A., Kougka, G., Tous, R., Montes, C. T., & Torres, J. (2017). Dynamic configuration of partitioning
        in   spark   applications.   IEEE   Transactions   on   Parallel   and   Distributed   Systems.
        https://doi.org/10.1109/TPDS.2017.2647939
        Gounaris,  A.,  &  Torres,  J.  (2018).  A  Methodology  for  Spark  Parameter  Tuning.  Big  Data  Research.
        https://doi.org/10.1016/j.bdr.2017.05.001
        Gu, J., Li, Y., Tang, H., & Wu, Z. (2018). Auto-Tuning Spark Configurations Based on Neural Network. In
        IEEE International Conference on Communications. https://doi.org/10.1109/ICC.2018.8422658
        Hernández, Á. B., Perez, M. S., Gupta, S., & Muntés-Mulero, V. (2018). Using machine learning to optimize
        parallelism   in   big    data   applications.   Future   Generation   Computer   Systems.
        https://doi.org/10.1016/j.future.2017.07.003
        Islam, M. T., Srirama, S. N., Karunasekera, S., & Buyya, R. (2020). Cost-efficient dynamic scheduling of big
        data  applications  in  apache  spark  on  cloud.  Journal  of  Systems  and  Software,  162,  110515.
        https://doi.org/10.1016/j.jss.2019.110515
        Khalil, W. A., Torkey, H., & Attiya, G. (2020). Survey of Apache spark optimized job scheduling in big data.
        International  Journal  of  Industry  and  Sustainable  Development  (IJISD)  (Vol.  1).  Retrieved  from
        http://ijisd.journals.ekb.eg39
        Li, M., Liu, Z., Shi, X., & Jin, H. (2020). ATCS: Auto-Tuning Configurations of Big Data Frameworks Based






        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [156]
        Artificial Intelligence in the 4th Industrial Revolution
   164   165   166   167   168   169   170   171   172   173   174