An Innovative Approach to Topic Clustering for Social Media and Web Data Using AI
Abstract
The wealth of data available in social media and the web can be valuable for many purposes, including brand reputation management, topic research, competitive analysis, product development, public opinion surveys and many more. However, analyzing this data to identify patterns and extract useful conclusions can be challenging due to the volume of collected posts, which can be several thousands, even for a single day. One useful approach is the use of topic clustering, to create clusters of mentions that refer to a specific topic. Doing so, will result in a number of manageable clusters that can contain hundreds or thousands of posts. With these clusters, one can now have a much more meaningful overview of the topics being discussed, instead of having to go through each post and manually categorize them into topics. There are several topic detections algorithms that can achieve clustering of posts, such as LDA, NMF, BERTopic etc. The existing algorithms however have several important drawbacks, including language constraints and slow or resource-intensive processing of data. More importantly, the label for the clusters is usually comprised of a few keywords, that do not make much sense unless someone browses through the mentions inside the cluster. Recently, with the introduction of AI Large Language Models, such as GPT-4, new techniques can be realized for topic clustering, to address the aforementioned issues. Our novel approach (AI Mention Clustering) employs LLMs at its core, to produce an algorithm for efficient and accurate topic clustering of web and social data. Our solution was tested on actual social and web data and was compared to the popular existing algorithm of BERTopic, demonstrating superior resource efficiency and absolute accuracy of clustered documents. Furthermore, it produces summaries of the clusters and are easily understood by humans, instead of just representative keywords. This way, it makes the work of a researcher of social & web data, faster, easier and more productive.
Related articles
Related articles are currently not available for this article.