This experiment is part of the Cartography and Mapping Challenge Grant. Browse more projects

Where on Earth do you tweet and map?

$114
Raised of $2,000 Goal
6%
Ended on 2/11/17
Campaign Ended
  • $114
    pledged
  • 6%
    funded
  • Finished
    on 2/11/17

Methods

Summary

The project relies on successfully building a database that contains data contributions (e.g. social media images or map data) for the same individuals. To achieve this, two different methods are proposed.

The first method obtains user names from one platform and then looks for contributions under the same user name in another platform. Though easily applicable, this method results only in a subset of individuals actually contributing to several platforms since the same individual user may use a different name in different platforms. However, being automatic, the number of sample users identified with this method is high.

The second method relies on contacting users in VGI or social media platforms through posts and requesting them to voluntarily share their usernames in different platforms, or even to access their private contribution history (e.g. FourSquare/Swarm check-ins). Though user numbers are not expected to be as high as with the first method, the advantage here is that the complete set of platforms an individual contributes to will be known

Challenges

There are two challenges to overcome in this project. First, building the initial database will require developer time to implement data collection and data mining tools, especially for the second method mentioned above. This step involves developing different data scraper tools for each data platform included in this research. As APIs and data formats are not standardized among different applications, adequate planning including the database design along with normalization methods are essential.

The second challenge is to conduct analysis (described below) on a worldwide scale. Traditional GIS tools and software usually perform poorly when applied on a worldwide scale and when data volume gets high. In addition, built-in tools of most GIS software do not allow such analyses out of the box, therefore developing new tools is essential. We aim to create a scalable optimized workflow in a cloud computing environment to overcome this challenge.

Pre Analysis Plan

The research can be separated into overlap analysis of activity spaces and change analysis. Overlap analysis describes a user’s behavior as it provides information on where his/her contribution can be found in space. This type of analysis can be conducted within the time range when users were active in multiple platforms. Since individual contributions can be very limited in spatial extent (e.g. a point) or temporal extent (e.g. a given second), contributions need to be aggregated in accordance with the type of contribution analyzed. A suitable means to spatially aggregate data edits or contributions for comparison between data sources is by regular grid cells. Other types of aggregations to reflect activity spaces are circle or confidence error ellipse, or characteristic hull polygons, which, however result in irregular patterns and abstraction of contribution patterns. Activity spaces can also be extracted from a probabilistic surface generated by a kernel function, e.g. by taking the minimum area in which 90% of the activities can be found. This will also result in an irregular shape.

The finest resolution of overlap analysis for the spatial and spatio-temporal realm will depend on the coarsest resolution of the data sources compared since this will determine the common denominator for spatial analysis. For example, geo-tagged tweets with place locations at the city level facilitate a meaningful grid size of 5 x 5 sqkm for all analyses, whereas more precise contributions, such as individual OSM edits or Mapillary image contributions will allow a refined analysis grid of 1 x 1 sqkm. The spatio-temporal comparison requires a temporal aggregation of contribution counts as well. When analyzing years of data contributions from two platforms, aggregations of contribution counts over days or weeks will be appropriate for comparison. Spatial overlap of activity spaces (regardless of how they were extracted) can be quantified as the percentage of co-located area. The spatial component of co-located areas can also be extracted with geometry operations. This overlap can be calculated for platform pairs and broken down for more than two platforms as well. In addition to the overlap measures above, once a user has been identified to contribute to different platforms, the radius of gyration can also be computed from individual contributions in each platform and then be compared between different platforms to quantify the spatial spread of a user’s activity in different platforms.

Change analysis describes how a user’s focus shifts from one platform to another over time due to the introduction of new platforms or a shift in the user’s preference or motivation. The time scale of this analysis has to be conducted on a larger time frame. The number of individuals whose dataset contributions will be compared for any of these analyses depends on the number of individuals identified to contribute to multiple platforms. Ideally we would be able to perform overlap and change analysis for at least 100 individuals participating in chosen platform pairs to have a representative sample. Not all possible combinations of platforms will be targeted. In general, we aim to analyze pairs where at least one source has a massive number of users in order to increase the chance for finding enough overlapping participants. Overlap analysis can also be conducted if not the complete history of contributions for an individual is available (e.g., from tweets), as long as data contributions between both data sources share a large enough time range of contributions (e.g. a year). For change analysis, ideally the entire history of a participant’s contributions to two platforms is available. However, change analysis is also possible if contributions in both platforms are only available within some time range before and after the start point in the second platform.


Protocols

This project has not yet shared any protocols.