Keeping up to date with technical advances
Python versus Tableau for data visualisations
Python is great, but...
Since taking up Python several years ago I've often thought of it as the Swiss Army Knife of programming languages. After successfully using it for webscraping, software development, number-crunching and data visualisation one achieves a certain level of comfort in knowing that, if you need to do it in code, you can probably do it in Python.
This mentality comes with a downside though. Whilst it's great to have command of a language that can accomplish so much, there are times when one needs to realise that there are simply better tools available for a specific task. If you are writing high-frequency trading algorithms, for example, you would probably find that a compiled language such as C++ gets you to the front of the queue of short-lived market opportunities ahead of a Python bot. (OK, there is an argument that the Python bot is much quicker to write in the first place, but you get my point!).
Recreating a Python plot using Tableau.
One of the great things about working at The Data Lab is the ability to try out different technologies and keep on the path of constant learning. Having played around with Tableau here for some time I was impressed by its ability to cope with geographical data. Recently I was thinking what data set I could throw at it next and then remembered that my first post on the main website blog concerning the globe-trotting of James Bond would be perfect.
To recap the details: as a labour of love I collated information from various sources on the location of James Bond in each of the 24 canon films to date. From this I worked out the routes and therefore the total distances travelled. After presenting tabular information I then plotted visualisations by actor and film, and also of the global locations our hero has visited.
All of these graphs were created with the matplotlib and Basemap packages within Python. The graph of locations visited is shown again below:
The code to achieve this is relatively short. All that is required is to read in an Excel file containing the country latitude and longitudes together with the film count. This latter value is used to scale the data points. However, when one writes a new block of code for the first time there is always a certain amount of time taken to ensure one is proceeding correctly, and (in my case anyway) looking at Stack Overflow for any exemplars to follow. Trying to remember back 18 months, I would say that this one plot probably took me a couple of hours-worth of effort.
Fast forward to the present day when I have access to Tableau. The beauty of using this software is that I can now generate the following plot in under two minutes:
The first thing we can see is that the plot looks more professional, with a cleaner underlying world map. Admittedly we don't yet have the full labelling on the data points like the Python plot, but using Tableau we get many more options to try out different views in a quicker and less cumbersome way.
For example, suppose that instead of using sized data points we wanted to colour the countries in proportion to their film appearance count. No problem! With literally one change on the marker type from "Circle" to "Map" we get this visualisation:
There are many, many more things we can do here, but we'll leave an exploration of Tableau for another time. What I'd prefer to stress is that in exactly the same way Python removes much of the pain of programming in a lower-level language, Tableau performs the exact same abstraction when it comes to developing visualisations. The time saved can be used to perform many more iterations, or to try out several different ideas before presenting the information back to the end user.
What can we take from this?
The main purpose of this article is not to sell Tableau on it's data visualisation abilities, nor is to disparage those of Python. What I wanted to highlight through this one simple example is how it is always important to keep an eye on the tools we use as data scientists and forever be on the lookout for new and better ways of doing things. Sometimes it is difficult to know whether a new tool or technique is simply a flash-in-the-pan, particularly in the open source world.
So how can we do this? Personally I tend not to be an early-adopter, opting instead to wait for a critical mass of users to know that stability and support have reached a certain level. To this end I find that it is crucial to keep one's finger on the pulse by attending conferences, reading blogs and newsletters, and critically by just trying things out. All of the data scientists I have spoken to about this in my role at The Data Lab are fortunate in that the companies they work for encourage this type of exploration. The experimental mindset that the data expert brings to their daily tasks is exactly the same as is required to ensure that their employers will benefit from any new developments in software or hardware. If you own or work for a company that is not so forward-thinking then I would encourage you to take steps to address this. If the company down the road is taking advantage of opportunities that you are not, there is a real danger of missing out on the competitive edges that come within a fast-moving technical environment.