Internet dropouts

29-Jul-2017 A few months ago, a friend of mine was having problems with the internet connection at the office and ask me for help. The problem was that the internet seems to dropout at random times and for random periods of time. I swung by her office and checked all the cables, they were all fine. Then stared tinkering with router til I found some odd configuration parameters, made some changes and told my friend: I made some changes in the router that could improve the connection. Let's keep an eye on it. Just write it down whenever there is a dropout to see if can identify a pattern. If the problem persists we'll call the internet company Walking back home, I kept thinking about two things: 1. How do I know that I've fixed the problem? 2. Making people write stuff down is not a very user friendly experience There has to be a better way...I thought. I digress, this is not about solving internet connection problems but using data visualisations to see patterns and gain insight. Long story short...I went back and wrote a little program and ran it on the reception computer that would check every minute if the internet was up or down and record the results. About four weeks later, I went back to analyse the data, which "looked" like this: date,day_of_year,time,minute_of_day,online 09/02/16,246,10:32:11 AM,633,yes 09/02/16,246,10:33:11 AM,634,yes 09/02/16,246,10:34:11 AM,635,yes 09/02/16,246,10:35:11 AM,636,yes 09/02/16,246,10:36:12 AM,637,yes ... (thousands rows more) I was expecting to see a total of 47250 data points (rows) equivalent to the number of minutes in 34 days. But there were only 9203. There is a problem in the experiment itself. Just by looking at the numbers I raised some questions: Why are we missing information? Is there a bug in the program that collects the data? Is there something external causing the program not to work? My default answer to this sort of questions is: There is a bug in the program. But after looking at the program again and running some tests, I couldn't find any problem. The data in raw form wasn't helpful either. It was at this point that I realised I needed to paint a picture of the data. I came up with this: My initial reaction when I saw the picture was that there were a lot more gaps that I thought. When I looked more closely I identified that data was being collected mostly between monday to Saturday between morning and afternoon... Duh, the computer is off when the office is closed! Hold on there is some data showing up out of working hours and on Sundays too. Wait, there are a few data points every 2 hours during the big gaps. Ahh, I know the computer goes to sleep when not in use and wakes up every 2 hours or so. That explains all the holes in the data. Phew, not such a big problem, great. But then, had the internet dropout problem been resolved? The visualisation would imply that it has improved. Whereas previously there were dropouts of about 2 hours every other day or so, now the dropouts were occurring for very short periods of time, infrequently. When I showed the visualisation to others I realised it didn't mean anything without context. What was the grey, white and red? What did any of it mean. So I added a legend to aid understanding:
Each line of the graph represents a minute of the day with the status of the connection: ⎯⎯⎯ Offline ⎯⎯⎯ Online and the white space represents times where no data was recorded. I kind of like the graph without a legend though. Even though is less clear, too me it looks like art 😊. Another interesting interesting thing, besides confirming that the internet connection had improved (original problem) I was able to find out a a bunch of other things just by looking at the graph, things like: • Macs when sleeping wake up for a minute and go back to sleep every 2 hours • Working hours are from Monday to Saturdays from 7am to 8pm • Fridays and Saturdays seem to be the slowest days This new information is unrelated to the original problem but still interesting. So data visualisations are a great way to answer questions that are normally not even asked. Questions and answer pop up just by looking at the data, when it is presented appropriately. And sometimes you end up with pretty pictures as well. About the data: • collected from: 02-09-2016 to 06-10-2016 (34 days) • number of data points: 9204 • values of each point: date, time and internet status • total data values: 27612 (9204 x 3) Back to Playground