In September, I presented our latest study at the International Conference on Knowledge Technologies and Data-Driven Business (iKnow 2014) in Graz.
In this study, we explored how social and semantic data can be used to monitor risks around supplier factories. We focused our study on Indonesia, as it exhibits both an important position as an outsourcing country for several major brands as well as a high social media usage.
Data sample
We compiled a sample of 139 factories in Indonesia supplying 4 popular companies in the textile, sports and electronics industry. Each factory is described by its name and its address. All data was retrieved from the respective company website.
Main research question
- Can user-generated data help to determine the physical location (GPS-coordinates) of supplier factories?
- How could we link semantic data to attain risk information about supplier factories?
The most interesting facts and results
1. Mapping Services could map only few factory addresses
Using Google Maps, Nokia Here Maps, Bing Maps, Open Street Maps (Nominatim) to transform the address information into GPS-coordinates we could only retrieve accurate GPS-coordinates for few (20/139) factories. There were considerable differences in the number of addresses which could be transformed to GPS coordinates, and precision levels.
2.Most of the factories in our sample have a Foursquare profile
For most of the factories (122/139) we could find a profile on the geo-social network “Foursquare”. Foursquare profiles are created by users, those might in this case be workers or people living around the production site.
Typically users register a location with its name and purpose using mobile devices. Thereby maps are created collectively.
3.Most of the factories were tagged on Wikimapia
Most of the factories (94/139) were tagged by users on the crowdsourced map “Wikimapia”. On Wikimapia users can tag buildings with their names or purpose on satellite pictures, thereby they create maps.
4. Factory locations in Indonesia can be best determined using User-Generated Information
By comparing the returned GPS-coordinates with a manually compiled groundtruth dataset, we found that User-generated Information (Foursquare, Wikimapia) can provide more accurate location results than geo-coding services (Google, Nokia). The first diagram below shows for how many factories we retrieved a result, wheras the second one shows how many of these were correct. Correct means within 1km distance of our ground-truth data set.
5. User-Generated information is not 100% reliable
The biggest advantage of user-generated information is at the same time also its biggest disadvantage – everyone can contribute information. Allthough collective data collection might result in huge resources, it is hardly validated resulting in lower data quality. During our reserach we often found multiple profiles for the same factory, or multiple venues at one location. Therefore Information Credibility is certainly an issue here.
Further Information
If you want to read more about it please take a look at the whole paper, or send me an email.
The publication has been developed jointly with my colleagues at the Information & Software Engineering Group @ Vienna University of Technology:
Madlberger, L., Hobel H., Thöni, A., Tjoa, A.M. Analysing Supplier Locations Using Social and Semantic Data: A Case Study Based on Indonesian Factories 12th International Conference on Knowledge Management and Knowledge Technologies, 2014 Download (pdf)