Нейросетевые методы анализа текстов сообщений в пользовательских дискуссиях социальной сети Youtube
С бурным развитием веб-пространства обычно требуется несколько секунд, чтобы новости стали широко распространены. В частности, для некоторых деликатных или широко популярных событий они, вероятно, привлекают внимание большого количества людей, которые создают бурю общественного мнения во всем обществе. Независимо от того, является ли такое влияние, которое оказывает шторм, положительным или отрицательным, когда дело касается огромной аудитории, его необходимо отслеживать со всех сторон, учитывая серьезные последствия, к которым он может привести. В то же время для журналистов, организаций и заинтересованных сторон необходимо и обязано гибко анализировать действия пользователей и понимать общественное мнение по конкретным темам. После этого они способны дать правильный ответ, выявить реальные проблемы и даже выявить риск вводящих в заблуждение манипуляций.
В этой статье автор в основном разработал общее решение для анализа пользовательской дискуссии и пользовательской сети. И автор выбирает YouTube, который является одной из самых известных и представительных социальных платформ, в качестве источника сканирования данных. В нашем решении применяются комбинированные аналитические методы и различные подходы, включая современные модели обработки языков. В нашем эксперименте случай неправомерных действий американской полиции по отношению к чернокожему служит иллюстрацией того, как проводить фактический анализ. Временно наша модель поддерживает только текст на английском языке.
Introduction 4
TheRelevanceofResearchTopic …………….. 4 TheAimandObjectivesofWork …………….. 5 PracticalValuesofWork…………………. 6 StructureofWork…………………….. 7
1 Overview 8
1.1 OverviewofExistingMethods……………. 8
1.1.1 Embeddings from Language Models . . . . . . . . . 9
1.1.2 LongShort-TermMemory ………….. 11
1.1.3 YAKE……………………. 13
1.2 ApplicationandSolution………………. 15
2 Solution for User Discussion Analysis in YouTube 17
2.1 ArchitectureofSolution ………………. 17 2.1.1 DataCrawling………………… 18 2.1.2 DataPreprocessing……………… 20
2.2 ArchitectureofTechniqueStack…………… 21
2.3 Neural network approach for user message analysis . . . . 22
2.3.1 Word2vec ………………….. 22
2.3.2 TransferLearning………………. 24
2.3.3 UniversalSentenceEncoder …………. 25
2.3.4 Bidirectional Encoder Representations from Trans- former……………………. 27
2.4 DeepLearninginSentimentAnalysis . . . . . . . . . . . . 29
1
2.4.1 Text-To-Text Transfer Transform . . . . . . . . . . 31
3 Experiment 32
3.1 DescriptionofDataset ……………….. 32
3.2 Training………………………. 32 3.2.1 ModelEvaluation………………. 32
3.2.2 Application of Trained Model on Real Case . . . . 33 3.3 VisualizationofSocialGraphbyRealCase. . . . . . . . . 35
Conclusion Acknowledgements References
40 41 42
The Relevance of Research Topic
Nowadays, the videos or news with more than millions of comments are quite ubiquitous because of the consistent investment of large scale of In- ternet infrastructure construction all around the world. According to the Statista, there are roughly 4.66 billion people around the world using the Internet at the start of 2021. This number is close to 60 percent of the total population in the world and it is still climbing. Imagine it, when the latest news are just emerging, they only takes a few minutes or even seconds to be widespread among the colossal Internet user. We are definitely excited to witness such miracle, in another respect, it also demonstrates the great development of modern society. However, we should not let the obvious phenomenon blind our eyes. In some perspectives, the huge public opinion storm attached to these news is a double-edged sword which can severely damage the stability, prosperity and safety of society as well. Especially for the government, the journalists, the company and the relevant parties, they can easily be pushed to the centre of discussion as shown in many real cases. Under this circumstance, they are more eager to figure out the users’ feedback [17] through various methods and we are sharing the same thought with lots of people that it’s quite urgent to strengthen robust and sufficient ability of sentiment analysis of the public.
Speaking of sentiment analysis, it’s also known as opinion mining which has close relationship with natural language processing, text analy- sis, computational linguistics [21]. When sentiment analysis was first being introduced on public opinion analysis at the beginning of 20th century, it was used on written paper document and 99% of the papers which in-
4
terpret computer-based sentiment analysis only have been published after 2004 [15]. In particular, recently the techniques related to natural language processing are developing rapidly and the focus of application of sentiment analysis has been turned to Facebook, Twitter and other social platforms. But we found that the actual usage of these researches is not adequate in spite of burgeoning innovations. And what we would like to accomplish in this paper is exactly trying to close the gap and applying the latest models into detecting users’ sentiments which is served for better understanding of messages in YouTube.
The Aim and Objectives of Work
The prominent aim of our work is to propose a general solution to the anal- ysis of user message and user network by different events shown in social network. Through such solution, it’s possible to detect hidden dependency among users and disclose detailed information. To be frank, currently there are plenty of mature sentiment analysis systems which have already been successfully put in market, such as brand24 and Mediatoolkit. However, in this thesis, we are not trying to bringing up a fresh new model, instead, we pay more attention to the integration of the updated language models based on high performance distributed computing platform in real cases.
Meanwhile, in order to achieve our desired outcome, following steps have been taken. Step A: Investigation of relative techniques and pa- pers. Actually, with the rapid development of computer science within these years, lots of outstanding techniques in language model have been proposed, such as long short-term memory, bidirectional language mode, embeddings from language models, universal sentence encoder, bidirec- tional encoder representation from Transformers and etc. Step B: These
5
approaches are being thoroughly compared for making the optimal decision to the usage. Step C: We learn the common complete solution for sentiment analysis from [11]. As described in our solution, we design our architecture starting from data crawling, data preprocessing, combination of modern language model and summarization. In contrast to existing solutions, the most paramount difference is the combination of several modern language model is being put into action. On one hand, these models emphasize different aspects – speed and accuracy which can be chosen according to actual need, on the other hand, through the dual model, we could add an insurance to the result analysis instead of solely based on one model. Besides, compared with the popular analytic principle, we are making an optimization named as separation mechanism to present the analysis re- sult more acceptable and more accurate. Step D: Eventually, we select suitable tools for implementing our solution. Specific descriptions for tools are discussed in next section.
Practical Values of Work
More and more, the analysis of general public opinion has become a pre- requisite ability for many areas. For company, they would like to detect the general feedback or principal concern of their new product. For the consul- tancy, they are capable of previously understanding the trend of striking issues from the analysis. For journalist or social media, the public opinion analysis could do them a great favor when they are trying to make a solid and thorough report on the specific news. As to government, they are able to make precise policy or action to the problem which is disclosed in the sentiment analysis. For instance, the nationwide public opinion detection platform has been widely applied in China. The platform is capable of
6
detecting the timely primary topic of the public from different kinds of social network. Once the topic and general sentiment are determined, the corresponding person in charge will be informed and they will take right reaction to it. That’s why we believe it is beneficial to apply the state of the art of techniques to comprehend the public opinion and provide the evi- dence for the right reaction. Meanwhile, we also make the search of pivotal figures among users possible with the help of social network analysis.
Our work inherits the basic character of the cases above which indi- cate that our solution can be utilized for various scenarios as well. Besides, we are more enthusiastic to promote it into other fields where our solution could better serve the people.
Structure of Work
Basically, our solution contains the following 5 steps: data crawling, data preprocessing, analytical module for user message analysis, summarization analysis and social network analysis. In terms of steps, the thesis is divided into 6 parts. Part I, Introduction. We introduce the background, aim and general application of sentiment analysis. Part II, Overview. This part is mainly about the description of current methods for sentiment analysis and the solution which I particularly proposed. Part III, Solution for user discussion analysis in Youtube. It refers to the actual models and solution which are utilized in our project. Part IV, Experiment. We disclose sev- eral comprehensive experiments according to our theoretical part. Part V, Conclusion. Part VI, reference.
Последние выполненные заказы
Хочешь уникальную работу?
Больше 3 000 экспертов уже готовы начать работу над твоим проектом!