Tuesday, 13 October 2015

Thirty five – London Business Analytics

The event had the title ‘exploring exotic patterns in data’; it was a lecture by a retired academic from the University of Dundee. I was getting a bit fed up of going to technology events that I knew nothing about, but I had to go: rules were rules.

I looked at the map: it was close to Liverpool Street station. I walked up an escalator and onto the concourse. It was rush hour; commuters were fighting their way home towards Essex and the surrounding areas. The familiar atmosphere was one of frantic determination; the day’s work was done.

When I finally made it to the street, it was twilight and it had started to rain. Buses rammed the streets and cyclists squeezed passed waiting taxis. I looked around, and recognised a walkway from the street photography Meetup.

Two minutes later, I was where I needed to be. After a further minute of hanging about in the reception area, I was told to go to the second floor and soon saw where I needed to go: a conference room at the end of a corridor that was overflowing witbh people. Luckily enough, I was just in time to avail myself of a bottle of free beer; the sign of a good event. I introduced myself to a chap called Gary, who worked as an IT contractor, then settled down for the lecture.

Our speaker for the night was Professor Mark Whitehorn, gave a very clear introduction about what he intended to talk about: the Monte Carlo Simulation or method (which I had heard about), a programming language called ‘R’ (which I had never heard of before), and Benford’s law (which was also unfamiliar to me).

Professor Whitehorn introduced the Monte Carlo method with a modest dose of history. The technique was invented by Stanislaw Ulam who worked at the Los Alamos Laboratory, New Mexico, where the first nuclear weapons were designed. Legend has it that when he was ill, Ulam played the card game Patience. ~As he played, he wondered about the probability of certain cards coming up.

There are two ways to figure this out: you could either do some calculations, or you could just go ahead and play the game and see what happens. If you play the game you can use the inherent randomness of the cards (providing that you shuffle them properly) to understand the behaviour of your system. The Monte Carlo method got its name because playing these simulations is a bit like playing a game at a casino. Of course, you don’t really play cards or go to a casino: you write a computer program and get the computer to do all your experiments for you.

Professor Whitehorn introduced us to another idea: the idea of a random walk. A random walk was when you start off at a location, and then choose to go in any direction, a step at a time. You might take one step north, one step south, or you might go east or west. A random walk might take you many steps away from your original location, or you might end up back at the same location. We were asked a question: how might we find out, on average, how far we are from the original location if we take a certain number of steps? The answer was: we can create a Monte Carlo simulation and use the ‘R’ programming language.

I was thinking about my own random walk. I had randomly chosen to attend a lecture that was about randomness. Would I end up in the same place that I started? How far would I ‘move’ from the person I was at the start of this quest? Does each Meetup nudge me in a different direction?

Professor Whitehorn has written a computer program in 'R’ to depict a random walk. The first screen showed a walk of ten steps. It then had one hundred steps. It then had one thousand steps. In an instant, a new picture of one hundred thousand steps was presented: this new walk seemed to look like the coast of a continent; it appeared organic. Each walk was different. Each ended in a different place. Some graphical walks appeared almost circular, going nowhere. Other walks extended from one side of the screen to the other.

But why would you write a computer program that simulates walking to try to solve an artificial problem? All these ideas were connected to a paper by Einstein that was about understanding Brownian motion; the way that particles move through a substance. When we’re talking about particles, it was argued, we can forget about random walks in a two dimensional sense, but we can think about it in terms of three dimensions; they can walk up and down, as well as side to side.

In an instant, computing, physics, and randomness were all combined. Professor Whitehorn ran his computer program to simulate a random walk of one hundred steps a total of fifty thousand times. The answer was that the walker ended up, on average, a total of 8.86 steps away from the initial starting point.

His talk took a slight diversion away from the theoretical and onto the practical: he spoke about generating data from a Monte Carlo simulation and using modern database tools to analyse huge data sets.

The final part of his talk returned to the topic of randomness, introducing Benford’s Law. ‘Close your eyes.’ Our lecturer was now turning into a magician. ‘Imagine an infinitely long street; any street. Choose any house on that street. Do you have a house in mind?’ There was silence. Data scientists were a tough crowd. ‘Take the first number from the number of the house that you’ve chosen. Okay, who has chosen the number nine…?’ One person put their hand up. The process continued until a graph of all the numbers had been plotted. This was, apparently, a Benford distribution.

The Benford distribution is a distribution of numbers that can be derived from different sources of real-life data. It is apparently really useful for accounting and fraud detection: if there’s non-random weird stuff going on in a set of numbers, the results don’t adhere to Benford’s Law. What this told me was that if I decided to cheat and just go on events that I wanted to go on rather than choosing events at random, then I’ll be found out.

Time was up and the packed room was stiflingly hot. After answering a few questions, everyone gave Professor Whitehorn a big clap. We all staggered out into a kitchen area for some fresh air.

‘Hello!’ I recognised someone, a friend of mine; a friend who knew about my quest.

‘Chris! What are you doing here!’ I told Anna that this was step number thirty five in a random walk of one hundred steps. I met Anna at the comedy night I went to after the Write Together Meetup. I didn’t know that Anna was a member of this group, or that she was a data scientist. I then recognised a guy who was called Riccardo, who was doing the rounds of different technology Meetup groups.

I tried to find time to chat to Professor Whitehorn, but he was too busy answering questions. Instead, I managed to chat with the organiser for a while. The group had only been set up six or seven months ago, and had around one thousand members: an impressive number. There was a talk every month, and the next one was on the subject of ‘data analysis teams’; a talk that was less about technology and more about people.

When it got to eight o’clock, we were all turfed out onto the street. Security had to lock up the building. I never got to chat with Professor Whitehorn, but Anna and I travelled together to London Bridge. As we travelled she told me that she was planning to go out drinking with the Greek Friends Meetup.

‘Have you been to that one?’ Anna asked.

‘Not yet!’

No comments:

Post a Comment