Employment Dataset


This set consists of employment opportunities posted from http://www.computrabajo.com affiliate sites which primarily serve Mexico and South American countries. Postings are temporary and may be taken down at any time due to a number of factors so this data set is an attempted persistence of these postings for analysis over a long period of time. Employment analysis can be important to determine how populations, economies, and cultures are changing and can easily complement social media analysis.

Background and Formats

The dataset consists of 119+ Million jobs and is about 40 GB in size. There are approximately 2.1 Million unique jobs in the set as many records are duplicates. To understand the dataset more thoroughly, a description of how this data is scraped is required: Every page of job postings from each affiliate site is parsed once per day. Each employment record has a first-seen and last-seen date. When a posting is seen for the first time, its first-seen and last-seen dates are set. Then every day that the posting still appears, its last-seen date is updated. If a posting stops appearing, it's last-seen date stops being updated. As expected, lots of textual entries are in Spanish. The Translated Location field is parsed out from the data and run through a geo-fixing service to estimate a rough latitude and longitude. When entries are missing at scale for a given day, one can assume the scraper wasn't running and the entries were missed.


Demonstrating all unique job postings data points on map.



Providing query menu to visualize result on map.



Providing search capability and demonstrating results on map.



Tracking movement of search result temporarily.


