Page 129 - The-5th-MCAIT2021-eProceeding
P. 129

Proposed Method on Phishing Email Classification Using
                                        Behavior Features


                                                                      b
                                                                                     c
                                           a
                    Ahmad Fadhil Naswir *, Lailatul Qadri Zakaria , Saidah Saad
                              a,b,c  Universiti Kebangsaan Malaysia, Bangi, Selangor, 43600, Malaysia
                                           * Email: afadhilen@gmail.com

        Abstract

        Phishing email are also known as cyber-attack cannot be separated from the existence of the sender, the attacker of deceptive
        phishing will create an email based on observations and different ways of writing. Due to the characteristic of spam and
        phishing email constantly changed and updated, some features are needed to be modify to get better result. Generally,
        features used on the phishing email classification are the structure of the email itself namely email header, email body, and
        URL. With the large number of attackers who make email in the same scope on behalf of a trusted company, we can observe
        and analyze the differences in human/attacker aspect on the email content in terms of writing, word choice, and language
        style or in terms of stylometric features. There is still uncertainty of those features combined with email features on deceptive
        phishing email classification and detection. In other word, there are still human behavior features that can be observed more
        on deceptive phishing email classification, e.g. word choices, grammar, emotion of context, etc. that can be combined and
        implemented with email behavior features (header, body, and URL).

        Keywords: Phishing Email; Phishing; Features; Features Selection; Behavior Features; Email Features; Stylistic; Stylometric; Email
        Classification


        1. Introduction

           Phishing email attacks are email based attacks which sent from someone or group of people for the purpose
        of fool the victim. It frequently will provide a luring message to trap the victim into entering at a fake website
        which look like a decent website. The website which already created by the attacker before doing the phishing
        attack will be provide some sensitive input, like username, password, credit card number, or any confidential
        data (Dakpa & Augustine, 2017). This type of email will be sent to predetermined target or massively to anyone
        depends on how the phishing will be done. Victims of the attacks will be tricked to input their confidential data
        to the fake website because the email tend to spoofed as an email from trusted companies such as Amazon, E-
        bay, or Google. The FBI has suggested that the impact of phishing attacks could be costing US businesses
        somewhere around $5 billion a year (Bagui et al., 2019).
           Due to the characteristic of spam and phishing email constantly changed and updated, some features are
        needed to be modify to get better result. Generally, features used on the phishing email classification are the
        structure of the email itself namely email header, email body, and URL. Those features will be selected and
        extracted for improving the classifier performance (El Aassal et al., 2020). Phishing email cannot be separated
        from the existence of the sender, the attacker of deceptive phishing will create an email based on observations
        and different ways of writing. They utilize a strategy to create urgent atmosphere that convince the victim to
        react, for example, account alert or promising reward (Imaduddin et al., 2019). The attacker behavior aspects
        on writing part make as sure as possible to trap the victims to follow the flow of the email made by the attacker.
        Due to the large number of attackers who make email in the same scope on behalf of a trusted company, we can
        observe and analyze the differences in email content in terms of writing, word choice, and language style or in
        terms of stylometric features (Kumar et al., 2018).









        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [116]
        Artificial Intelligence in the 4th Industrial Revolution
   124   125   126   127   128   129   130   131   132   133   134