The Data Team @ The Data Lab

Analysis of Gaelic Station Names

An exploration of inter-language similarity measures for place-names and the design of rural scores.


Most of modern Scotland was once Gaelic-speaking and a policy change in 2010 means Gaelic names appear alongside English names on almost all station signs across Scotland's railway. I live in Glasgow and often travel out into the highlands and over time I hypothesised:

H1: The Gaelic and English names of . . .

Read More

Posted in: matthew higgs

November 13, 2017

Snakes and Ladders (Part 3 of 3)

Analysing the classic children's game

Silvrback blog image

To recap the analysis from our previous article, we have now shown that the advantage to Player 1 in snakes and ladders is minimal (amounting to less than 6 extra wins out of every 1,000 games). In this post we look at visualising some results, focussing in particular on the distribution of game lengths and the frequency with which . . .

Read More

November 07, 2017

Dealing with many dimensions in historical data

Tracking cooperation & conflict patterns over space and time in R

For this post, I've managed to find some extremely interesting historical event data offered by the Cline Center on this page. As you will see, this dataset can be quite challenging because of the sheer number of dimensions you could look at. With so many options, it becomes tricky to create visualisations with the 'right' level . . .

Read More

November 03, 2017

Snakes and Ladders (Part 2 of 3)

Analysing the classic children's game

Silvrback blog image

In the previous post in this series we set out the basic Python code required to simulate a single game of snakes and ladders. In order to analyse the game in more detail we will be required to simulate multiple random games so that we can look at certain properties, such as expected game lengths, the occupancy of squares, and the . . .

Read More

October 31, 2017

Snakes and Ladders (Part 1 of 3)

Analysing the classic children's game

In this short series of three blog posts we show how easy it can be to take an everyday activity and analyse it using Python, gaining insights that might illuminate or in some cases even surprise...

Anyone who has ever played games against young children knows that they absolutely must go first, and my daughter Eva is no exception. . . .

Read More

October 24, 2017

Data guidelines

A set of recommendations for clean and usable data

The extent to which a dataset follows a set of commonly expected guidelines will often determine how much time you have left to spend thinking about your analysis. Ideally, you might intend to spend 20% of your time cleaning the data for a project, and 80% planning and carrying out your actual analysis. But often, it might turn out to be the . . .

Read More

October 17, 2017

LA maps of crime

Using R to map criminal activity in LA since 2010

I’ve recently come across—a huge resource for open data. At the time of writing, there are close to 17,000 freely available datasets stored there, including this one offered by the LAPD. Interestingly, this dataset includes almost 1.6M records of criminal activity occurring in LA since 2010—all of them described according to a . . .

Read More

October 12, 2017