Tuesday, 14 August 2018

U.K. Traffic Accident Analysis (2012-14)

 During my 6-week summer training at C-DAC, Mohali this June, I learned the Hadoop Infrastructure, basics of Big Data and Machine Learning. As part of the final assignment, I was given the data set of 'Traffic Accident Analysis in the United Kingdom'.
Here is the detailed version of the entire project made by me individually.

Data Set Acquisition 
The data set was procured from Kaggle.com. The data available on Kaggle was from the recorded year of 2006 to 2014 extending up to 1.6 million rows. Limited by the processing power and memory of my system(Intel i5 6th gen, 8GB DDR3 RAM), I had to settle for just the data of 2012 to 2014(3 years).
Reference Link : https://www.kaggle.com/daveianhickey/2000-16-traffic-flow-england-scotland-wales/version/10#_=_

Procedure
After the data was gathered from Kaggle, due to it's sheer volume and the system's incapability to process and return a usable result in reasonable time, the data was cleansed randomly using the R programming language to nearly 16000 rows.
The data was then ingested into the Hadoop Infrastructure installed locally on my system so that the data could be processed using the Pig and Hive querying languages.
For representing the less inferable .csv format of the query result, Tableau and WEKA(Waikato Environment for Knowledge Analysis) tools were used.

Tools Used
  1. The R Programming Language
  2. WEKA
  3. Hadoop Infrastructure
    • Pig
    • Hive
  4. Tableau
Columns available in the Data Set
The image above represents the distribution of all the columns in the data set in the representational view of WEKA.
  1. Longitude
  2. Latitude
  3. The Severity of the accident
  4. Number of Vehicles involved in the accident
  5. Number of Casualties
  6. Date of Accident
  7. Day of the week
  8. Time
  9. Road 1 (where accident took place)
  10. Type of road
  11. Speed Limit in the area
  12. Junction Control near accident site
  13. Road 2 (in case accident took place on a junction)
  14. Lighting during the time of the accident
  15. Weather condition in the area during the time of the accident
  16. Road surface on which the accident happened
  17. The year the accident took place
The Road Map of the United Kingdom (as present in the data set)

RESULTS OBTAINED : 
Casualty vs Weather
The picture above represents circular graph charts for the number of casualties that took place in the different weather conditions.
It was observed that most accidents took place when the weather was bright and without snow or rainfall. Rain(green) claimed most lives in the U.K.

Road Surface vs Casualties
Here, we see that most accidents took place on dry surfaces due to over speeding(explained in the following graphs). Wet or damp and Frosty road surface claimed most number of lives over the recorded 3 years from 2012 to 2014.

Road Type vs Casualties
Most of the accidents took place on single carriageway roads, followed by double carriageway roads.
Slip roads proved to be most secure considering lower speed limits and single way traffic.

Speed Limit vs Hour of the Day
As was expected, roads with no junction control proved to be the platform of most road accidents in the U.K. Roads which were controlled by Automatic Traffic Systems or Traffic Lights proved to be most secure. Though the most severe accident claimed 11 lives occured in the early hours of the day, most accidents are observably in the darker hours of the day.

Day of the week vs number of accidents
To present to the audience how important it is to wear seat belts and caution proper measures on the road, I came up with this Jitter-Plot in WEKA that shows no matter what day of the week or which year, road accidents are just round the corner if one does not take the necessary precautions.

Weather with Road Surface - MAP
The distribution of weather with road surface on the map of U.K. shows that the upper areas in the country like Scotland and Wales due to their proximity to the polar ice caps experience higher percentage of accidents due to snow and fog on frosty roads. The lower areas like London experience more accidents due to fog or mist on dry roads.

Light variation with Hour of the day - MAP
The map above shows how light variations are observed at the different hours of the day. This might sound cliche, but looking closely we observe how Scotland and Wales area experiences darkness even in day light hours due to its proximity to the poles and the tilt of the Earth.
Most accidents, expectedly, occur in after sunset hours in the darkness in areas surrounding London.

How was this useful?
Data analysis of this kind can help the government in recognizing the faults in its current road traffic management and help save millions of lives in the future.
Further, we can infer points like -
  1. Scotland and Wales require better lighting and cold weather prevention tactics
  2. England and surrounding area's people need ot drive slower.
  3. No day of the week is safer on the road, always protect yourseslf.
For viewing the entire presentation, go to prezi.com and search for "UK Traffic Accident Analysis - CDAC".

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Well written..well analysed and well represented. Great. Keep it up!

    ReplyDelete