Evictions and Code Violations in Philadephia

Xiong Zheng

1 Introduction

In this project, we'll explore spatial trends evictions in Philadelphia using data from the Eviction Lab and building code violations using data from OpenDataPhilly. We'll be exploring the idea that evictions can occur as retaliation against renters for reporting code violations. Spatial correlations between evictions and code violations from the City's Licenses and Inspections department can offer some insight into this question.

Background readings:

drawing
HuffPost article
drawing
PlanPhilly article

2 Evictions

In [7]:
import matplotlib
import pandas as pd
import geopandas as gpd
import hvplot.pandas
import geoviews as gv
import geoviews.tile_sources as gvts
import holoviews as hv
import numpy as np
import rasterio as rio
import contextily as cx
hv.extension('bokeh', 'matplotlib')
from matplotlib import pyplot as plt
from holoviews import opts
from rasterio.mask import mask
from rasterstats import zonal_stats
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline
pd.options.display.max_columns = 999
pd.options.display.max_rows = 999

2.1 Data Wrangling for evictions

We will need to trim data to Philadelphia only. Take a look at the data dictionary for the descriptions of the various columns in the file: eviction_lab_data_dictionary.txt. The column names are shortened — see the end of the above file for the abbreviations. The numbers at the end of the columns indicate the years. For example, e-16 is the number of evictions in 2016.

we are interested in the number of evictions by census tract for various years. Right now, each year has it's own column, so it will be easiest to transform to a tidy format. The tidy data frame have four columns: GEOID, geometry, a column holding the number of evictions, and a column telling you what the name of the original column was for that value.

In [8]:
evPA = gpd.read_file("./data/PA-tracts.geojson")
evphilly = evPA.loc[(evPA['pl']=="Philadelphia County, Pennsylvania")]

# Trim to specific columns
needcolumns = ['GEOID','geometry']
value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
needcolumns.extend(value_vars)

# Data Melting
evphilly = evphilly[needcolumns]
value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
mlt_evphilly = evphilly.melt(
    id_vars=['GEOID','geometry'],
    value_vars=value_vars,
    var_name='year', 
    value_name='evictions')

2.2 Plot the total number of evictions per year from 2003 to 2016

In [9]:
group1_evphilly = mlt_evphilly
group1_evphilly = group1_evphilly.groupby('year',as_index=True)['evictions'].sum()

# Plot
bar1=group1_evphilly.hvplot(
    kind='bar',
    color='#FE5C5A',
    hover_color = '#FFBA48',
    line_color = '#FE5C5A',
    rot=90, 
    width=800,
    height=300,
    title='The total number of evictions per year from 2003 to 2016',
    xlabel='Year',
    ylabel ='Total Number',
    ylim=tuple([0,16000]),
    attr_labels=True)

dot1=group1_evphilly.hvplot(
    kind='scatter',
    color='#31FFCE',
    size = 50,
    hover_color = '#FFBA48',
    rot=90, 
    width=800,
    height=300,
    attr_labels=True)

line1=group1_evphilly.hvplot(
    kind='line', 
    color='#27CCA4',
    width=800,
    height=300)

final_chart = bar1 * line1 * dot1
final_chart.opts(bgcolor='#2C2C5C')

2.3 The number of evictions across Philadelphia

In [10]:
group_evphilly = mlt_evphilly
geo_evphilly = mlt_evphilly.loc[(mlt_evphilly['year']=="e-03")]
geo_evphilly = geo_evphilly.drop(['year','evictions'],axis=1)
group_evphilly = group_evphilly.groupby(['year','GEOID'],as_index=False).sum('evictions')
group_evphilly = group_evphilly.merge(geo_evphilly, on='GEOID')

# Plot
hv.output(widget_location='bottom')
choro = group_evphilly.hvplot(c='evictions', 
                     frame_width=600, 
                     frame_height=600, 
                     groupby = 'year',
                     alpha=0.65,
                     geo=True, 
                     cmap='viridis', 
                     hover_cols=['GEOID'],
                     dynamic=False,
                     )
total = gvts.Wikipedia * choro
total.opts(
    opts.WMTS(width=100, height=100, xaxis=None, yaxis=None),
    opts.Overlay(title="The number of evictions across Philadelphia"))

3 Code Violations in Philadelphia

3.1 Data Wrangling for Code Violations in Philadelphia

In this section, we'll explore data for code violations from the Licenses and Inspections Department of Philadelphia to look for potential correlations with the number of evictions.

There are many different types of code violations. More information on different types of violations can be found on the City's website.

Below, we've selected 15 types of violations that deal with property maintenance and licensing issues. We'll focus on these violations. The goal is to see if these kinds of violations are correlated spatially with the number of evictions in a given area.

In [11]:
LViolation = pd.read_csv("./data/li_violations.csv")
LViolation['geometry'] = gpd.points_from_xy(LViolation['lng'], LViolation['lat'])
LViolation = gpd.GeoDataFrame(LViolation, geometry='geometry', crs="EPSG:4326")

violation_types = [
    "INT-PLMBG MAINT FIXTURES-RES",
    "INT S-CEILING REPAIR/MAINT SAN",
    "PLUMBING SYSTEMS-GENERAL",
    "CO DETECTOR NEEDED",
    "INTERIOR SURFACES",
    "EXT S-ROOF REPAIR",
    "ELEC-RECEPTABLE DEFECTIVE-RES",
    "INT S-FLOOR REPAIR",
    "DRAINAGE-MAIN DRAIN REPAIR-RES",
    "DRAINAGE-DOWNSPOUT REPR/REPLC",
    "LIGHT FIXTURE DEFECTIVE-RES",
    "LICENSE-RES SFD/2FD",
    "ELECTRICAL -HAZARD",
    "VACANT PROPERTIES-GENERAL",
    "INT-PLMBG FIXTURES-RES",
]

slc_LViolation = LViolation.loc[LViolation['violationdescription'].isin(violation_types)]

3.2 A hex bin map

The code violation data is point data. We can get a quick look at the geographic distribution using the hex bin map.

In [16]:
hv.output(widget_location='bottom')

hex1 = slc_LViolation.hvplot.hexbin(
                      frame_width=600, 
                      frame_height=600, 
                      x='lng', 
                      y='lat', 
                      groupby = "violationdescription", 
                      logz=True, 
                      geo=True, 
                      gridsize=40, 
                      cmap='viridis',
                      dynamic=False)

boundary1 = geo_evphilly.hvplot.polygons(
                      geo=True,
                      alpha=0,
                      line_alpha=0.5,
                      line_width =1,
                      line_color="black",
                      hover=False,
                      width=600,
                      height=600,)

combination = gvts.Wikipedia*hex1 * boundary1
combination.opts(
    opts.WMTS(width=100, height=100, xaxis=None, yaxis=None),
    opts.Overlay(title="Geographic distribution of code violation"))

3.3 The number of violations by type per census tract

To do a census tract comparison to our eviction data, we need to find which census tract each of the code violations falls into. we'll want to find the number of violations (for each kind) per census tract. The result of this step should be a data frame with three columns: violationdescription, GEOID, and N, where N is the number of violations of that kind in the specified census tract.

In [13]:
tracts_slc_LViolation = slc_LViolation
tracts_slc_LViolation = gpd.sjoin(tracts_slc_LViolation, geo_evphilly, how='right')
numtract = tracts_slc_LViolation
numtract["N"] = 1
numtract = numtract.groupby(['GEOID','violationdescription'])['N'].sum()
numtract = numtract.unstack(fill_value=0).stack().reset_index(name='N')
numtract = geo_evphilly.merge(numtract, on='GEOID')

# Plot
hv.output(widget_location='bottom')

pol1 = numtract.hvplot.polygons(c='N',
                      frame_width=600, 
                      frame_height=600,  
                      groupby = "violationdescription", 
                      geo=True, 
                      cmap='viridis',
                      alpha=0.7,
                      hover_cols='all',
                      dynamic=False)

combination2 = gvts.Wikipedia * pol1
combination2.opts(
    opts.WMTS(width=600, height=600, xaxis=None, yaxis=None),
    opts.Overlay(title="Geographic distribution of code violation"))

4 A side-by-side comparison

As a final step, we'll make a side-by-side comparison to better show the spatial correlations between eviction and code violations.

In [14]:
evphilly_16 = group_evphilly.loc[(group_evphilly['year']=='e-16')]
specific_numtract = numtract.loc[(numtract['violationdescription']=='VACANT PROPERTIES-GENERAL')]

choro2 = evphilly_16.hvplot(c='evictions', 
                     frame_width=390, 
                     frame_height=600, 
                     alpha=0.7,
                     geo=True, 
                     cmap='viridis', 
                     hover_cols='all',
                     dynamic=False,
                     title="Geo-distribution of eviction in 2016",
                     fontsize=9
                     )

pol2 = specific_numtract.hvplot(c='N',
                      frame_width=390, 
                      frame_height=600,  
                      geo=True, 
                      cmap='Plasma',
                      alpha=0.7,
                      hover_cols='all',
                      dynamic=False,
                      title="Geo-distribution of the violation - VACANT PROPERTIES-GENERAL",
                      fontsize=9
                      )

combination3 = (gvts.Wikipedia * choro2 + gvts.Wikipedia * pol2).cols(2)
combination3

After that, we will identify the 20 most common types of violations within the time period of 2012 to 2016 and use this set of maps to identify 3 types of violations that don't seem to have much spatial overlap with the number of evictions in the City.

In [16]:
top20_vilation = LViolation
top20_vilation['N']=1
top20_vilation = top20_vilation.groupby('violationdescription',as_index=False).sum('N').sort_values(['N'],ascending = False)
top20_vilation = top20_vilation.iloc[0:20]
top20_vilation = top20_vilation.drop(['lat','lng'],axis=1)
top20_vilation = top20_vilation.reset_index(drop=True)
top20 = top20_vilation['violationdescription']

top_2_LViolation = LViolation.loc[LViolation['violationdescription'].isin(top20)]
top_2_LViolation = gpd.sjoin(top_2_LViolation, geo_evphilly, how='right')
top20_numtract = top_2_LViolation
top20_numtract = top20_numtract.groupby(['GEOID','violationdescription'])['N'].sum()
top20_numtract = top20_numtract.unstack(fill_value=0).stack().reset_index(name='N')
top20_numtract = geo_evphilly.merge(top20_numtract, on='GEOID')

total_evphilly = mlt_evphilly
total_evphilly = total_evphilly.groupby('GEOID',as_index=False).sum('evictions')
total_evphilly = total_evphilly.merge(geo_evphilly,on='GEOID')

points_evphilly = total_evphilly
points_evphilly['geometry'] = points_evphilly['geometry'].centroid
points_evphilly.crs =total_evphilly.crs
points_evphilly['eviction2']=points_evphilly['evictions']/3

# Plot
hv.output(widget_location='bottom')

pol2 = top20_numtract.hvplot.polygons(c='N',
                      frame_width=600, 
                      frame_height=600,  
                      groupby = "violationdescription", 
                      geo=True, 
                      cmap='viridis',
                      alpha=1,
                      hover_cols='all',
                      dynamic=False)


point3 = points_evphilly.hvplot(
                      size='eviction2',
                      line_color = '#38CBE7',
                      frame_width=600, 
                      frame_height=600,  
                      geo=True, 
                      c='#38CBE7',
                      alpha=0.4,
                      dynamic=False)

combination3 = gvts.Wikipedia * pol2 * point3
combination3.opts(
    opts.WMTS(width=600, height=600, xaxis=None, yaxis=None),
    opts.Overlay(title="Geographic distribution of Top20 code violation"))

Conclusion: 'EXTA-VACANT LOT CLEAN/MAINTAI', 'LICENSE - RENTAL PROPERTY','RUBBISH/GARBAGE EXTERIOR-OWNER' are three types of violations that don't seem to have much spatial overlap with the number of evictions in the City.