Proceedings of International Conference on Applied Innovation in IT 2023/11/30, Volume 11, Issue 2, pp.81-89 Dynamic Topic Modelling of Online Discussions on the Russian War in UkraineTaras Ustyianovych, Nadiia Kasianchuk, Halina Falfushynska, Solomiia Fedushko and Eduard SiemensAbstract: The availability of robust end-to-end ML processes plays a crucial role in delivering an accurate and reliable system for real-time text data inference. In this paper, we present an approach to building machine learning operations (MLOps) and an observability application to perform topic modelling of online discussions in social media, here observed based on topics and threads related to the Russian war in Ukraine. Splunk Enterprise is the main tool and platform used throughout this research with its knowledge discovery, dashboarding, and alerting. 30GB of social media text data coming from a Russian social network VKontakte over the time line January 2022 to May 2023. Main inquiries included text mining and topic modelling, which we managed to perform over the observation period using Python frameworks, mainly gensim for text processing and MLflow for experiment management and logging. The Splunk architecture allowed us to ingest and analyse the results and prediction of ML experiments for dynamic topic modelling, and served as a MLOps solution. The designed set of five dashboards played a crucial role in determining the optimal model hyperparameters (number of topics, A-priori belief on document-topic distribution, number of total corpus passes) and drift detection which occurred almost every two-three weeks depending on the phase of the war. Our application assisted us with text analysis, discovering how events on the battlefield influenced social media discussions, and what post attributes contributed to a high user engagement. With our setup we were able to find out how antiwar hashtags have been used to promote misleading content actually supporting the war against Ukraine. The analysis of the researched discussions shows a trend where usage of adjectives decreased over time since the war has started, whereas an increase for nouns and verbs usage over time. Information distortion has steadily been present in the content leading to bias and misleading data in social media discussions. Keywords: Machine Learning Operations (MLOps), Social Media Discussions, Russian War in Ukraine, Splunk Enterprise, Latent Dirichlet Allocation (LDA). DOI: 10.25673/112997 Download: PDF References:
|
|
DOI: http://dx.doi.org/10.25673/115729
Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
ISSN 2199-8876
Publisher: Edition Hochschule Anhalt
Location: Anhalt University of Applied Sciences
Email: leiterin.hsb@hs-anhalt.de
Phone: +49 (0) 3496 67 5611
Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.