Detect Malware in Portable Document Format Files (PDF) Using Support Vector Machine and Random Decision Forest
DOI:
https://doi.org/10.15575/join.v3i2.196Keywords:
portable document format, malware, classification, support vector machine, random forestAbstract
Portable Document Format is a very powerful type of file to spread malware because it is needed by many people, this makes PDF malware not to be taken lightly. PDF files that have been embedded with malware can be Javascript, URL access, media that has been infected with malware, etc. With a variety of preventive measures can help to spread, for example in this study using the classification method between dangerous files or not. Two classification methods that have the highest accuracy value based on previous research are Support Vector Machine and Random Forest. There are 500 datasets consisting of 2 classes, namely malicious and not malicius and 21 malicius PDF features as material for the classification process. Based on the calculation of Confusion Matrix as a comparison of the results of the classification of the two methods, the results show that the Random Forest method has better results than Support Vector Machine even though its value is still not perfect.
References
V. Total, “File Statistics.” [Online]. Available: https://www.virustotal.com/en/statistics/. [Accessed: 23-Jan-2018].
J. S. Cross and M. A. Munson, “Deep PDF Parsing to Extract Features for Detecting Embedded Malware,” 2011.
C. Smutz and A. Stavrou, “Malicious PDF detection using metadata and structural features,” in ACSAC ’12 Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248.
N. Šrndic and P. Laskov, “Detection of malicious pdf files based on hierarchical document structure,” in In Proceedings of the Network and Distributed System Security Symposium, NDSS 2013, 2012.
K. Sembiring, Penerapan Teknik Support Vector Machine untuk Pendeteksian Intrusi pada Jaringan. Institut Teknologi Bandung, 2007.
“Support Vector Machines - Scholastic Video Book Series,” Scholastic Tutors, 2014. [Online]. Available: https://scholastictutors.webs.com/Scholastic-Book-SupportVectorM-Part01-2014-01-26.pdf.
L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
A. Acrobat, “What is PDF?” [Online]. Available: https://acrobat.adobe.com/sea/en/acrobat/about-adobe-pdf.html?promoid=CW7625ZK&mv=other. [Accessed: 11-Feb-2017].
D. Stevens, “Malicious PDF documents explained,” IEEE Secur. Priv., vol. 9, no. 1, pp. 80–82, 2011.
L. Rocha, “Malicious Documents - PDF Analysis in 5 Steps,” 2014. [Online]. Available: https://countuponsecurity.com/2014/09/22/malicious-documents-PDF-analysis-in-5-steps/. [Accessed: 12-Feb-2017].
R. Kohavi and F. Provost, “Confusion matrix,” Mach. Learn., vol. 30, no. 2–3, pp. 271–274, 1998.
Downloads
Additional Files
Published
Issue
Section
Citation Check
License
Copyright (c) 2018 Jurnal Online Informatika
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License