The Data Team @ The Data Lab

Unlocking value from data

Check out our sister blog: The Data Lab on Medium

Running R remotely

Some options and tips

Why would you need to do this? Say, for instance, you are dealing with sensitive data that should not leave a specific system, or quite simply that you are away on a work retreat - but your laptop is far less powerful than your work desktop computer which you left behind - so you want to keep using it from a distance. For such reasons, . . .

Posted in: caterina constantinescu

November 24, 2017

Excel-like functionality with Python pandas

The Data Lab takes the Pepsi Challenge!

Happy Birthday Excel!

Silvrback blog image

I would posit that the world's most used data science software is the ubiquitous Microsoft Excel. Released for Windows in November 1987, this month marks its 30th anniversary. In that time I'd imagine it has been employed by all manner of people across near all industries: from the fund manager . . .

Posted in: richard carter

November 17, 2017

A Simple Search Application for the Edinburgh Fringe

Silvrback blog image

Motivation

In 2017, the Edinburgh Festival Fringe was host to 3,398 shows selling over 2.5 million tickets, numbers that are increasing year on year. With this abundance of shows it can be difficult to find something that one wants to see. I describe here how I used data to create an application that will find shows similar . . .

Posted in: rachel kilburn

November 15, 2017

Analysis of Gaelic Station Names

An exploration of inter-language similarity measures for place-names and the design of rural scores.

Motivation

Most of modern Scotland was once Gaelic-speaking and a policy change in 2010 means Gaelic names appear alongside English names on almost all station signs across Scotland's railway. I live in Glasgow and often travel out into the highlands and over time I hypothesised:

H1: The Gaelic and English names of . . .

Posted in: matthew higgs

November 13, 2017

Snakes and Ladders (Part 3 of 3)

Analysing the classic children's game

Silvrback blog image

To recap the analysis from our previous article, we have now shown that the advantage to Player 1 in snakes and ladders is minimal (amounting to less than 6 extra wins out of every 1,000 games). In this post we look at visualising some results, focussing in particular on the distribution of game lengths and the frequency with which . . .

Posted in: richard carter snakes_and_ladders

November 07, 2017

Dealing with many dimensions in historical data

Tracking cooperation & conflict patterns over space and time in R

For this post, I've managed to find some extremely interesting historical event data offered by the Cline Center on this page. As you will see, this dataset can be quite challenging because of the sheer number of dimensions you could look at. With so many options, it becomes tricky to create visualisations with the 'right' level . . .

Posted in: caterina constantinescu

November 03, 2017

Snakes and Ladders (Part 2 of 3)

Analysing the classic children's game

Silvrback blog image

In the previous post in this series we set out the basic Python code required to simulate a single game of snakes and ladders. In order to analyse the game in more detail we will be required to simulate multiple random games so that we can look at certain properties, such as expected game lengths, the occupancy of squares, and the . . .

Posted in: richard carter snakes_and_ladders

October 31, 2017

← Previous 1 2 3 4 Next →