Exploiting Web Scraping for Education News Analysis Using Depth-First Search Algorithm

Authors

DOI:

https://doi.org/10.15575/join.v5i1.548

Keywords:

Algorithm, Depth-first search, Education news, Online news, Web scraping

Abstract

Online news is one source of data that is always up to date and provides information or factual data. The search engine is one of the features for users to be able to enter keywords based on the expected category quickly. The development of education in Indonesia makes it essential to discuss, in this study using unstructured data in online news with the keyword Education included as a parameter, and adding search methods in the field of Artificial Intelligence so that the data becomes more accurate. Data that used here was from online news, namely CNN Indonesia, Detikcom, and Liputan6. Using Python Programming with depth-first search method (DFS), when compared with the results data for relevant news. Web erosion using DFS will be very helpful in searching because this method can check the date data was sent and then track the destination URL. Of the three online media sites, Detikcom produces the highest monthly data yielding an average of 885 news about education. At the same time, Liputan6 has the least amount of data on average, 28 news per month, but the data obtained are very relevant compared to Detikcom and CNN Indonesia.

References

A. Setiawan, E. U. Artha, E. R. Arumi, Sunarni, A. Primadewi, and S. Nugroho, “Task Analysis of Facebook users on Frequently used Menus,†J. Phys. Conf. Ser., vol. 1179, no. 1, 2019.

R. Hanifah and I. S. Nurhasanah, “Implementasi Web Crawling Untuk Mengumpulkan Web Crawling Implementation for Collecting,†J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, pp. 531–536, 2018.

I. Kim and G. Pant, “Predicting web site audience demographics using content and design cues,†Inf. Manag., vol. 56, no. 5, pp. 718–730, 2019.

P. Liu, X. Xia, and A. Li, “Tweeting the financial market: Media effect in the era of Big Data,†Pacific Basin Financ. J., vol. 51, no. May 2017, pp. 267–290, 2018.

I. P. Sonya, “Analisis Web Scraping untuk Data Bencana Alam dengan Menggunakan Teknik Breadth-First Search Terhadap 3 Media Online,†J. Ilm. Inform. Komput. Univ. Gunadarma, vol. 21, no. 3, pp. 69–77, 2016.

A. Miqdad et al., “Penerapan focused crawling pada situs berita online,†2016.

M. Kumar, A. Bindal, R. Gautam, and R. Bhatia, “Keyword query based focused Web crawler,†Procedia Comput. Sci., vol. 125, pp. 584–590, 2018.

F. A. Suharno and L. Listiyoko, “Aplikasi Berbasis Web dengan Metode Crawling sebagai Cara Pengumpulan Data untuk Mengambil Keputusan,†in Seminar Nasional Rekayasa Teknologi Informasi, 2018, no. November, pp. 105–109.

E. R. Arumi, Sunarni, and P. Nuraini, “PENINGKATAN MINAT KONSELING DAN SELF DISCLOSURE SISWA MELALUI APLIKASI E-KONSELING DI SMP MUHAMMADIYAH PUJOTOMO,†in Seminar Nasional Hasil Penelitian dan Pengabdian Pada Masyarakat IV Tahun 2019, 2019, pp. 574–579.

E. R. Arumi and U. Yudatama, “Pemanfaatan Curiculum Vitae dan Sasaran Kinerja Pegawai untuk Penilaian Kinerja Dosen Menggunakan AHP,†J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 3, pp. 170–176, Nov. 2017.

J. S. Jones-Diette, R. S. Dean, M. Cobb, and M. L. Brennan, “Validation of text-mining and content analysis techniques using data collected from veterinary practice management software systems in the UK,†Prev. Vet. Med., vol. 167, no. July 2018, pp. 61–67, 2019.

A. Saverimoutou, B. Mathieu, and S. Vaton, “A 6-month analysis of factors impacting web browsing quality for QoE prediction,†Comput. Networks, vol. 164, p. 106905, 2019.

X. Xie, Y. Fu, H. Jin, Y. Zhao, and W. Cao, “A novel text mining approach for scholar information extraction from web content in Chinese,†Futur. Gener. Comput. Syst., no. xxxx, 2019.

D. Peng, T. Li, Y. Wang, and C. L. Philip Chen, “Research on information collection method of shipping job hunting based on web crawler,†8th Int. Conf. Inf. Sci. Technol. ICIST 2018, pp. 57–62, 2018.

K. Sellamy et al., “Web mining techniques and applications: Literature review and a proposal approach to improve performance of employment for young graduate in Morocco,†2018 Int. Conf. Intell. Syst. Comput. Vision, ISCV 2018, vol. 2018-May, pp. 1–5, 2018.

S. H. Hong, S. K. Lee, and J. H. Yu, “Automated management of green building material information using web crawling and ontology,†Autom. Constr., vol. 102, no. March, pp. 230–244, 2019.

F. Asdaghi and A. Soleimani, “An effective feature selection method for web spam detection,†Knowledge-Based Syst., vol. 166, pp. 198–206, 2019.

T. Okuhara, H. Ishikawa, M. Okada, M. Kato, and T. Kiuchi, “Contents of Japanese pro- and anti-HPV vaccination websites: A text mining analysis,†Patient Educ. Couns., vol. 101, no. 3, pp. 406–413, 2018.

G. Deepak and J. S. Priyadarshini, “Personalized and Enhanced Hybridized Semantic Algorithm for web image retrieval incorporating ontology classification, strategic query expansion, and content-based analysis,†Comput. Electr. Eng., vol. 72, pp. 14–25, 2018.

N. Hosseini, F. Fakhar, B. Kiani, and S. Eslami, “Enhancing the security of patients’ portals and websites by detecting malicious web crawlers using machine learning techniques,†Int. J. Med. Inform., vol. 132, no. March, 2019.

J. Cheng, X. Zhao, J. Liu, and Y. Zhang, “Automated Test Generation Based on Colored Petri Net and Improved Depth First Search for Train Control System,†2019 Chinese Control Conf., pp. 6761–6765, 2019.

E. H. Fernando, H. Sagala, A. E. Budiman, I. N. Husada, and H. Toba, “Ekstraksi dan Analisis Produk di Marketplace Secara Otomatis dengan Memanfaatkan Teknologi Web Crawling,†vol. 5, pp. 350–359, 2019.

A. Josi, L. A. Abdillah, and Suryayusra, “Penerapan teknik web scraping pada mesin pencari artikel ilmiah,†2014.

N. Nafi’iyah and E. Sulistiono, “Pemanfaatan robot crawler pada pembuatan toko buku online,†JOUTICA-PRESS, pp. 12–16, 2016.

L. B. Ilmawan, “MEMBANGUN WEB CRAWLER BERBASIS WEB SERVICE UNTUK DATA CRAWLING PADA WEBSITE GOOGLE PLAY STORE,†Ilk. J. Ilm., vol. 10, pp. 215–224, 2018.

R. Gunawan, A. Rahmatulloh, I. Darmawan, and F. Firdaus, “Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath,†in International Conference on Industrial Enterprise and System Engineering (IcoIESE 2018) Comparison, 2019, vol. 2, no. IcoIESE 2018, pp. 283–287.

A. S. Hidayatullah and C. Setianingsih, “REALIZATION OF DEPTH FIRST SEARCH ALGORITHM ON LINE MAZE SOLVER ROBOT,†in The 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC) REALIZATION, 2017, pp. 247–251.

Downloads

Published

2020-07-16

Issue

Section

Article

Citation Check