Implementation of Dynamic Topic Modeling to Discover Topic Evolution on Customer Reviews

Valentinus Roby Hananto

doi:10.15575/join.v8i2.963

Authors

Valentinus Roby Hananto Department of Information Systems, Faculty of Technology & Informatics, Universitas Dinamika, Indonesia https://orcid.org/0000-0003-1988-3168

DOI:

https://doi.org/10.15575/join.v8i2.963

Keywords:

Dynamic topic modeling, BERTopic, Customer reviews, Topic evolution

Abstract

Annotation and analysis of online customer reviews were identified as significant problems in various domains, including business intelligence, marketing, and e-governance. In the last decade, various approaches based on topic modeling have been developed to solve this problem. The known solutions, however, often only work well on content with static topics. As a result, it is challenging to analyze customer reviews that include dynamic and constantly expanding collections of short and noisy texts. A method was proposed to handle such dynamic content. The proposed system applied a dynamic topic model using BERTopic to monitor topics and word evolution over time. It would help decide when the topic model needs to be retrained to capture emerging topics. Several experiments were conducted to test the practicality and effectiveness of the proposed framework. It demonstrated how a dynamic topic model could handle the emergence of new and over-time-correlated topics in customer review data. As a result, improved performance was achieved compared to the baseline static topic model, with 25% of new segmented texts discovered using the dynamic topic model. Experimental results have, therefore, convincingly demonstrated that the proposed framework can be used in practice to develop automatic review annotation tools.

Author Biography

Valentinus Roby Hananto, Department of Information Systems, Faculty of Technology & Informatics, Universitas Dinamika

I am a lecturer at Universitas Dinamika in the Information Systems department. My research focuses on Natural Language Processing and Text Mining. I deliver lectures to undergraduate students on topics such as Decision Support Systems, Business Intelligence, and Data Science. I obtained my Doctoral Degree in 2022 from Ritsumeikan University, my Master's Degree in Engineering Management from the University of South Florida, and my Bachelor's Degree in Information Systems from Stikom Surabaya.

References

M. Sun, “How Does the Variance of Product Ratings Matter?,” Manage. Sci., vol. 58, no. 4, pp. 696–707, Dec. 2011, doi: 10.1287/MNSC.1110.1458.

N. Bashir, K. N. Papamichail, and K. Malik, “Use of Social Media Applications for Supporting New Product Development Processes in Multinational Corporations,” Technol. Forecast. Soc. Change, vol. 120, pp. 176–183, Jul. 2017, doi: 10.1016/J.TECHFORE.2017.02.028.

A. Qazi, K. B. Shah Syed, R. G. Raj, E. Cambria, M. Tahir, and D. Alghazzawi, “A concept-level approach to the analysis of online review helpfulness,” Comput. Human Behav., vol. 58, pp. 75–81, May 2016, doi: 10.1016/J.CHB.2015.12.028.

G. O. Diaz and V. Ng, “Modeling and Prediction of Online Product Review Helpfulness: A Survey,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp. 698–708. Accessed: May 06, 2022. [Online]. Available: https://www.cse.msu.edu/

M. G. Parente, “Using NLP and Information Visualization to Analyze App Reviews,” Master Thesis, Utrecht University, Utrecht, Netherlands, 2018. [Online]. Available: https://dspace.library.uu.nl/bitstream/handle/1874/368082/MScThesis_MGarciaParente.pdf?sequence=2

Z. Jin, W. Zhangwen, and N. Naichen, “Helping consumers to overcome information overload with a diversified online review subset,” Front. Bus. Res. China, vol. 13, no. 1, pp. 1–25, Dec. 2019, doi: 10.1186/S11782-019-0062-1/TABLES/8.

H. Lee, K. Choi, D. Yoo, Y. Suh, S. Lee, and G. He, “Recommending valuable ideas in an open innovation community: A text mining approach to information overload problem,” Ind. Manag. Data Syst., vol. 118, no. 4, pp. 683–699, 2018, doi: 10.1108/IMDS-02-2017-0044/FULL/PDF.

F. Pech, A. Martinez, H. Estrada, and Y. Hernandez, “Semantic Annotation of Unstructured Documents Using Concepts Similarity,” Sci. Program., vol. 2017, 2017, doi: 10.1155/2017/7831897.

A. Canito, G. Marreiros, and J. M. Corchado, “Automatic Document Annotation with Data Mining Algorithms,” Adv. Intell. Syst. Comput., vol. 930, pp. 68–76, Apr. 2019, doi: 10.1007/978-3-030-16181-1_7.

S. Tuarob, L. C. Pouchard, P. Mitra, and C. L. Giles, “A generalized topic modeling approach for automatic document annotation,” Int. J. Digit. Libr., vol. 16, no. 2, pp. 111–128, Mar. 2015, doi: 10.1007/S00799-015-0146-2/TABLES/3.

K. Bontcheva and H. Cunningham, “Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation,” in Handbook of Semantic Web Technologies, Springer Berlin Heidelberg, 2011, pp. 77–116. doi: 10.1007/978-3-540-92913-0_3.

J. Qiang, Z. Qian, Y. Li, Y. Yuan, and X. Wu, “Short Text Topic Modeling Techniques, Applications, and Performance: A Survey,” IEEE Trans. Knowl. Data Eng., pp. 1–1, May 2020, doi: 10.1109/TKDE.2020.2992485.

E. Gallinucci, M. Golfarelli, and S. Rizzi, “Advanced topic modeling for social business intelligence,” Inf. Syst., vol. 53, pp. 87–106, Oct. 2015, doi: 10.1016/J.IS.2015.04.005.

X. Liao, Z. Zhao, X. Liao, and Z. Zhao, “Unsupervised Approaches for Textual Semantic Annotation, A Survey,” ACM Comput. Surv., vol. 52, no. 4, pp. 1–45, Aug. 2019, doi: 10.1145/3324473.

A. M. de Sousa and K. Becker, “Pro/Anti-vaxxers in Brazil: a temporal analysis of COVID vaccination stance in Twitter,” in 9th Symposium on Knowledge Discovery, Mining, and Learning, (KDMILE) 2021, Oct. 2021, pp. 105–112. doi: 10.5753/KDMILE.2021.17467.

Patrick Jähnichen, Florian Wenzel, Marius Kloft, and Stephan Mandt, “Scalable Generalized Dynamic Topic Models,” in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018, pp. 84:1427-1435. Accessed: May 19, 2022. [Online]. Available: https://proceedings.mlr.press/v84/jahnichen18a.html

Federico Tomasi, Praveen Chandar, Gal Levy-Fix, Mounia Lalmas-Roelleke, and Zhenwen Dai, “Stochastic Variational Inference for Dynamic Correlated Topic Models,” in Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020, pp. 124:859-868. Accessed: May 19, 2022. [Online]. Available: https://proceedings.mlr.press/v124/tomasi20a.html

I. Pak and P. L. Teh, “Text Segmentation Techniques: A Critical Review,” in Innovative Computing, Optimization and Its Applications: Modelling and Simulations, I. Zelinka, P. Vasant, V. H. Duy, and T. T. Dao, Eds. Cham: Springer International Publishing, 2018, pp. 167–181. doi: 10.1007/978-3-319-66984-7_10.

A. Sharma, S. Susan, A. Bansal, and A. Choudhry, “Dynamic Topic Modeling of Covid-19 Vaccine-Related Tweets,” ACM Int. Conf. Proceeding Ser., pp. 79–84, Feb. 2022, doi: 10.1145/3528114.3528127.

S. Mosallaie, M. Rad, A. Schiffauerova, and A. Ebadi, “Discovering the evolution of artificial intelligence in cancer research using dynamic topic modeling,” COLLNET J. Sci. Inf. Manag., vol. 15, no. 2, pp. 225–240, Jul. 2021, doi: 10.1080/09737766.2021.1958659.

R. Churchill and L. Singh, “Dynamic Topic-Noise Models for Social Media,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 13281 LNAI, pp. 429–443, 2022, doi: 10.1007/978-3-031-05936-0_34/COVER.

V. R. Hananto, U. Serdült, and V. Kryssanov, “A Text Segmentation Approach for Automated Annotation of Online Customer Reviews, Based on Topic Modeling,” Appl. Sci., vol. 12, no. 7, p. 3412, Mar. 2022, doi: 10.3390/APP12073412.

R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” in 25th International World Wide Web Conference, WWW 2016, Apr. 2016, pp. 507–517. doi: 10.1145/2872427.2883037.

M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM ’15, 2015, pp. 399–408. doi: 10.1145/2684822.2685324.

M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv:2203.05794 [cs.CL], Mar. 2022, doi: 10.48550/arxiv.2203.05794.

N. F. F. d. Silva et al., “Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 13074 LNAI, pp. 104–120, 2021, doi: 10.1007/978-3-030-91699-2_8/FIGURES/4.

C. Y. K. Williams, R. X. Li, M. Y. Luo, and M. Bance, “Exploring patient experiences and concerns in the online Cochlear Implant community: a natural language processing approach,” Clin. Otolaryngol., vol. 48, no. 3, pp. 442–450, Mar. 2023, doi: https://doi.org/10.1111/coa.14037.

L. McInnes, J. Healy, N. Saul, and L. Großberger, “UMAP: Uniform Manifold Approximation and Projection,” J. Open Source Softw., vol. 3, no. 29, p. 861, Sep. 2018, doi: 10.21105/JOSS.00861.

L. McInnes, J. Healy, and S. Astels, “HDBSCAN: Hierarchical density-based clustering,” J. Open Source Softw., vol. 2, no. 11, p. 205, Mar. 2017, doi: 10.21105/JOSS.00205.

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.