Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Authors

  • Kartika Rizqi Nastiti Department of Informatics, Universitas Islam Indonesia, Indonesia
  • Ahmad Fathan Hidayatullah Department of Informatics, Universitas Islam Indonesia, Indonesia
  • Ahmad Rafie Pratama Department of Informatics, Unversitas Islam Indonesia, Indonesia

DOI:

https://doi.org/10.15575/join.v6i1.636

Keywords:

Topic modeling, Latent Dirichlet Allocation, Topic discovery, computer science, coherence value

Abstract

Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.

References

Nastiti, Kartika Rizqi, “Pemodelan Topik untuk Penelitian di Bidang Informatika Menggunakan Metode Latent Dirichlet Allocation,†Undergraduate Thesis, Universitas Islam Indonesia, 2019.

S. Das, K. Dixon, X. Sun, A. Dutta, and M. Zupancich, “Trends in Transportation Research: Exploring Content Analysis in Topics,†Transportation Research Record, vol. 2614, no. 1, pp. 27–38, Jan. 2017.

M. Lamba and M. Madhusdhuan, “Application Of Topic Mining And Prediction Modeling Tools For Library And Information Science Journals,†Zenodo, Jan. 2018, DOI: 10.5281/zenodo.1298739.

A. Hamzah, A. F. Hidayatullah, and A. G. Persada, “Discovering Trends of Mobile Learning Research Using Topic Modelling Approach,†International Journal of Interactive Mobile Technologies (iJIM), vol. 14, no. 9, pp. 1–11, 2020.

G. Xu, X. Wu, H. Yao, F. Li, and Z. Yu, “Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model,†IEEE Access, vol. 7, pp. 21527–21538, Feb. 2019.

S. Liu, R. Y. Zhang, and T. Kishimoto, “Analysis and prospect of clinical psychology based on topic models: hot research topics and scientific trends in the latest decades,†Psychology, Health & Medicine, pp. 1–13, 2020.

L. Sun and Y. Yin, “Discovering themes and trends in transportation research using topic modeling,†Transportation Research Part C: Emerging Technologies, vol. 77, pp. 49–66, Apr. 2017, DOI: 10.1016/j.trc.2017.01.013.

A. Amado, P. Cortez, P. Rita, and S. Moro, “Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis,†European Research on Management and Business Economics, vol. 24, no. 1, pp. 1–7, Jan. 2018, DOI: 10.1016/j.iedeen.2017.06.002.

C. Zou, “Analyzing research trends on drug safety using topic modeling,†Expert Opinion on Drug Safety, vol. 17, no. 6, pp. 629–636, 2018.

O. K. Waluya, “Penerapan Information Retrieval Menggunakan Pemodelan Topik pada Dokumen Skripsi (Studi Kasus Ruang Baca Teknik Informatika UMG),†Undergraduate Thesis, Universitas Muhammadiyah Gresik, 2017.

Y. Wu, Y. Ding, X. Wang, and J. Xu, “A comparative study of topic models for topic clustering of Chinese web news,†in 2010 3rd International Conference on Computer Science and Information Technology, 2010, vol. 5, pp. 236–240.

H. Chen, X. Wang, S. Pan, and F. Xiong, “Identify topic relations in scientific literature using topic modeling,†IEEE Transactions on Engineering Management, 2019.

H. Chen, G. Zhang, D. Zhu, and J. Lu, “Topic-based technological forecasting based on patent data: A case study of Australian patents from 2000 to 2014,†Technological Forecasting and Social Change, vol. 119, pp. 39–52, Jun. 2017.

Y. Ding, “Topicâ€based PageRank on author cocitation networks,†Journal of the American Society for Information Science and Technology, vol. 62, no. 3, pp. 449–466, 2011.

A. Suominen and H. Toivanen, “Map of Science with Topic Modeling: Comparison of Unsupervised Learning and Human-Assigned Subject Classification,†Journal of the Association for Information Science and Technology, vol. 67, no. 10, pp. 2464–2476, 2016.

G. Zhao, Y. Liu, W. Zhang, and Y. Wang, “TFIDF based Feature Words Extraction and Topic Modeling for Short Text,†in Proceedings of the 2018 2nd International Conference on Management Engineering, Software Engineering and Service Sciences - ICMSS 2018, Wuhan, China, 2018, pp. 188–191, DOI: 10.1145/3180374.3181354.

A. F. Hidayatullah, W. Kurniawan, and C. I. Ratnasari, “Topic Modeling on Indonesian Online Shop Chat,†in Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval - NLPIR 2019, Tokushima, Japan, 2019, pp. 121–126, DOI: 10.1145/3342827.3342831.

B. Wang and S. Zhang, “A Novel Feature Selection Algorithm for Text Classification Based on TFIDF-Weight and KL-Divergence,†2005, pp. 438–441.

A. Danesh, B. Moshiri, and O. Fatemi, “Improve text classification accuracy based on classifier fusion methods,†in 2007 10th International Conference on Information Fusion, Quebec City, QC, Canada, Jul. 2007, pp. 1–6, DOI: 10.1109/ICIF.2007.4408196.

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,†Journal of Machine Learning Research, p. 30, 2003.

K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, “Exploring Topic Coherence over Many Models and Many Topics,†in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Islan, Korea, Jul. 2012, pp. 952–961.

M. Röder, A. Both, and A. Hinneburg, “Exploring the Space of Topic Coherence Measures,†in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM ’15, Shanghai, China, 2015, pp. 399–408, DOI: 10.1145/2684822.2685324.

C. Sievert and K. Shirley, “LDAvis: A method for visualizing and interpreting topics,†in Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Baltimore, Maryland, USA, 2014, pp. 63–70, DOI: 10.3115/v1/W14-3110.

S. Jayashankar and R. Sridaran, “Superlative model using word cloud for short answers evaluation in eLearning,†Education and Information Technologies, vol. 22, no. 5, Oct. 2016.

A. L. Uitdenbogerd, “World cloud: A prototype data choralification of text documents,†Journal of New Music Research, vol. 48, no. 3, pp. 253–263, May 2019, DOI: 10.1080/09298215.2019.1606255.

Downloads

Published

2021-06-17

Issue

Section

Article

Citation Check