A few months ago, NSW Transport announced to release Opal Card travel data to the public through its open data portal.
First thought was to try and build something like Uber’s Viz for train/bus commute. I assumed that the data will be de-identified and would consist of tap on/off location of each opal card, but was left disappointed after I downloaded the data. The data only consisted of Tap on/off count at 15 minutes interval at stations. For those who do not know about how opal card works, its similar to Oyster card in London.
With the data already downloaded I could only think about visualising it on a Map showing how people travel during the day. I had seen a few examples of d3js and was keen on replicating it. I started with the tutorials using Maps - shapefiles but got entangled, confused and didn’t find it pretty to just use the shape. So I started looking for api and came across leaflet and mapbox. I went with Mapbox with the sole reason that it allows Map customisation.
Sample of the raw data downloaded from Open data portal is as per below.
mode | date | tap | time | loc | count |
---|---|---|---|---|---|
bus | 20160730 | on | 02:30 | 2000 | 415 |
bus | 20160730 | on | 02:30 | 2135 | 18 |
train | 20160730 | on | 02:30 | Jannali Station | 31 |
bus | 20160730 | on | 13:30 | 2095 | 64 |
As the bus tap on/off data is at postcode level, there isn’t much I could do with it. So I decided to concentrate on train instead as I can get the locations of station. First things first, I filtered the data and geocoded to get stations locatons using R. The geocoding function that I use can be found here
Once I had the geocoded data, I loaded it using d3 and started creating the viz. As I spend a lot of time visualising data using tableau, I started using d3 the same way I would use Tableau. with the data loaded, I started to create some circles without realising that If a station had no tap on, it’s excluded from the original file. So I had to go back to prepare my data for d3 to use it.
Now, I have a record for each station for each 15 minute time interval. I use this dataset to build the visualisation. The complete visualisation of the weeks data is available here.
There is still some work to be done here, but I think this is the initial phase. The code to the js is here