The short version
The goal of my PhD project is to
- Develop a method to detect events of a certain type in multiple countries using microblog data.
- Evaluate this method using the example Labour Strike Events in multiple countries
The long version
“Can we detect labour strikes using Tweets?”
All the things we use daily, have been manufactured at some point. Most of the time in far distant places and we do not know where exactly things are produced and under which conditions. Although companies often publicly commit to social and environmental standards, they typically have widely spread supply chains, which makes it hard to attain timely information about what is happening in and around their manufacturing sites and to ensure the compliance to these standards.
On the other hand, more and more people in both developing as well as developed countries use social media to share their experiences, and local happenings with their followers online. Recently, several researchers showed that this online data can be used to detect events in the offline world. For example, Sakaki et. al [1] could detect earthquakes in Japan based on Twitter data or Signorini et. al [2] could monitor the spread of a disease.
In my PhD project, I study whether the same principle could be used, to monitor social and environmental impacts of companies using social media data.
My research focuses specifically on labour strike events, as these occur as a symptom for a variety of social or environmental problems, e.g. too little payment, unsatisfactory contracts or insufficient health standards.
Labour Strikes on Twitter
Labour strikes typically involve and affect many people, and are therefore also reflected on social media platforms like Twitter. As Tweets are by default public and samples can be easily retrieved using Twitter’s public API, it became a popular medium and data source for academic studies.
Here is for example, a diagram showing the amount of Tweets including the words “Batam” and “Adidas”. You see a peak in the line on a day on which several hundred workers of a factory on the island Batam which manufactures Adidas sportswear were on strike.
In this case, we knew already that a strike event occurred, using related keywords “Batam” and “Adidas” we can detect strike-related Tweets. In order to detect future strike events, the task is to find a method to effectively differentiate between event-related Tweets and non-related Tweets, this gets particularly challenging if we want to detect events in several countries, speaking different languages.
Multi-lingual Event Detection on Twitter
The technical problem tackled in my thesis is to detect events of a certain type (the type in my case is “Labour Strike”) given a stream of multi-lingual Tweets.
Existing event-detection approaches often rely on large amounts of manually labelled training data or the ability of researchers to define keywords and rules, both approaches become increasingly complex when applied in a multi-lingual context.
In contrast, I am developing a method which
- learns classification rules from Tweets relating to past events of the same type
- learns key terms across languages based on events with international impact.
Let me explain this using an example. In April 2014, a strike of Lufthansa employees in Germany caused the cancellation of about 4.000 flights around the world.
People in over 30 different languages published Tweets about the event online (here six of them are depicted):
If we for example, look at Greek Tweets published on the day of the strike including the word “Lufthansa” we can identify the words often mentioned together with “Lufthansa” on that day and are therefore probably event-related keywords:
Indeed the word “απεργία” means Strike. Using this approach we can identify greek event-related keywords without any knowledge of the language. Obviously, this method will also identify other commonly used words like “Pilot” or “Flight”. Therefore tweets have to be collected repeatedly over time during several events in order to be able to abstract from the single instance and yield reliable results.