Crime Doesn't Pay (Relational Model)
So what are relational databases?
The concept is pretty simple: relational databases are two databases that are "joined" on a common field. This common field can then be used to query information from both tables. If you know SQL, then you are familiar with the image to the left. The power of relational databases comes from asking a question (a query if you will) of the data and receiving two or more answers. What better way to understand this connect than with an example! The exhibit below takes three separate tables and joins them to create a story. |
The data below is taken from the New York data repository (data.ny.gov) regarding crime rates, gross annual income, and each county's geographical location. The exhibit analysis (if superficially) crime in New York in 2014 and possible reasons for the apparent decrease over previous years:
(below is a Tableau Story; these work best when you click the navigation boxes from left to right)
(below is a Tableau Story; these work best when you click the navigation boxes from left to right)
This exhibit doesn't involve too much Python magic other than sending the API requests to the endpoints in the NY governmental database and creating a pandas pivot table and a couple GroupBy methods to simplify historical data.
The real relational work came from joining the crime per county per year with annual gross income per county per year. Then I related the crime per county to the geolocation per county to plot the information in the NY map.
Overall gathering the information was quite simple, cleaning and relating the data was more challenging. You can find the Python code and the extracted CSVs in this github repo: https://github.com/vertex-live/NYCrime
The real relational work came from joining the crime per county per year with annual gross income per county per year. Then I related the crime per county to the geolocation per county to plot the information in the NY map.
Overall gathering the information was quite simple, cleaning and relating the data was more challenging. You can find the Python code and the extracted CSVs in this github repo: https://github.com/vertex-live/NYCrime