Detect Malware in Portable Document Format Files (PDF) Using Support Vector Machine and Random Decision Forest

Authors

  • Abdachul Charim Universitas Muhammadiyah Malang, Indonesia
  • Setio Basuki Universitas Muhammadiyah Malang, Indonesia
  • Denar Regata Akbi Universitas Muhammadiyah Malang, Indonesia

DOI:

https://doi.org/10.15575/join.v3i2.196

Keywords:

portable document format, malware, classification, support vector machine, random forest

Abstract

Portable Document Format is a very powerful type of file to spread malware because it is needed by many people, this makes PDF malware not to be taken lightly. PDF files that have been embedded with malware can be Javascript, URL access, media that has been infected with malware, etc. With a variety of preventive measures can help to spread, for example in this study using the classification method between dangerous files or not. Two classification methods that have the highest accuracy value based on previous research are Support Vector Machine and Random Forest. There are 500 datasets consisting of 2 classes, namely malicious and not malicius and 21 malicius PDF features as material for the classification process. Based on the calculation of Confusion Matrix as a comparison of the results of the classification of the two methods, the results show that the Random Forest method has better results than Support Vector Machine even though its value is still not perfect.

Author Biographies

Abdachul Charim, Universitas Muhammadiyah Malang

Teknik Informatika - Universitas Muhammadiyah Malang

Setio Basuki, Universitas Muhammadiyah Malang

Informatics Department, Universitas Muhammadiyah Malang, Malang

Denar Regata Akbi, Universitas Muhammadiyah Malang

Informatics Department, Universitas Muhammadiyah Malang, Malang

References

V. Total, “File Statistics.†[Online]. Available: https://www.virustotal.com/en/statistics/. [Accessed: 23-Jan-2018].

J. S. Cross and M. A. Munson, “Deep PDF Parsing to Extract Features for Detecting Embedded Malware,†2011.

C. Smutz and A. Stavrou, “Malicious PDF detection using metadata and structural features,†in ACSAC ’12 Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248.

N. Šrndic and P. Laskov, “Detection of malicious pdf files based on hierarchical document structure,†in In Proceedings of the Network and Distributed System Security Symposium, NDSS 2013, 2012.

K. Sembiring, Penerapan Teknik Support Vector Machine untuk Pendeteksian Intrusi pada Jaringan. Institut Teknologi Bandung, 2007.

“Support Vector Machines - Scholastic Video Book Series,†Scholastic Tutors, 2014. [Online]. Available: https://scholastictutors.webs.com/Scholastic-Book-SupportVectorM-Part01-2014-01-26.pdf.

L. Breiman, “Random forests,†Mach. Learn., vol. 45, pp. 5–32, 2001.

A. Acrobat, “What is PDF?†[Online]. Available: https://acrobat.adobe.com/sea/en/acrobat/about-adobe-pdf.html?promoid=CW7625ZK&mv=other. [Accessed: 11-Feb-2017].

D. Stevens, “Malicious PDF documents explained,†IEEE Secur. Priv., vol. 9, no. 1, pp. 80–82, 2011.

L. Rocha, “Malicious Documents - PDF Analysis in 5 Steps,†2014. [Online]. Available: https://countuponsecurity.com/2014/09/22/malicious-documents-PDF-analysis-in-5-steps/. [Accessed: 12-Feb-2017].

R. Kohavi and F. Provost, “Confusion matrix,†Mach. Learn., vol. 30, no. 2–3, pp. 271–274, 1998.

Downloads

Published

2019-02-01

Issue

Section

Article

Citation Check