The Social Media App, developed by CERTH, aims to provide the INGENIOUS system with crowdsourced information by collecting in a real-time manner social media data from Twitter (tweets), analysing their textual and visual content to enrich them with additional information, and allowing the end users to view the collected and analysed tweets in a friendly user interface.

The core component of the Social Media App is the Twitter Crawler, which establishes a connection to the Twitter Streaming API and collects continuously new posts directly from Twitter, based on specific search criteria that are defined by the INGENIOUS end users in relation to the examined use cases, i.e. terror attacks, fires and earthquakes. For every newly received tweet three different analyses are performed:

  1. To estimate the probability that the tweet contains misinformation, due to fake news spread on social media, Machine Learning is applied to classify posts as “real” or “fake” and assign a reliability score
  2. To enhance the tweets with geoinformation, a combination of Named Entity Recognition and Neural Networks detects the locations from the Twitter text and associates them with coordinates through the OpenStreetMap API.
  3. To identify whether a collected tweet is related or not to the INGENIOUS use cases (e.g. a tweet might include the word “earthquake” but refer to a movie and not a real earthquake incident), a Machine Learning classification model predicts its relevance (under development)

The collected and analysed tweets are displayed on a dedicated user interface (Figure 1), complementary to the INGENIOUS COP, where the end users can view and filter all the tweets.

Figure 1. Visualisation of collected and analysed tweets

Its design is straightforward and user-friendly. On the left, there is a dashboard, where the user can select the use case, the language of tweets, the creation date range of tweets and the reliability score range and has also the option to hide the retweets and quote tweets. The retrieved tweets are displayed on the right of the dashboard, as a list in a descending order based on their creation date, in pages of 50 tweets. For every tweet, the user is able to view the text and the attached image, the creation date and time, the detected locations and the reliability score.

This interface can serve an additional purpose, apart from a standalone visualisation of the collections. With a simple click (buttons on the right side of each tweet), users can label the tweets as relevant or irrelevant to the examined use cases, and this human annotation is valuable in order to train models that can classify new tweets automatically (as mentioned above), improving in this way the quality of the information coming into the system.

An alternative visualisation provided by the Social Media App can be seen in Figure 2. By offering the same filtering options as before, a heat map indicates areas of high activity based on the quantity of retrieved tweets.

Figure 2.Heat map visualisation, indicating areas with high activity on Twitter
Share this post!