The Philadelphia Parks and Recreation department is responsible for hosting all sorts of programming in their facilities. Currently their best understanding of demand for programming comes from existing programs data (like attendance) and some rudimentary data about park visits (eg. total trash collected).
Only recently, thanks to data collected by SafeGraph, it is now possible to analyze large data sets of cell phone location activity, including where people are traveling and how long they stay.
By incorporating this novel dataset, our goal for this project is to help the PPR better understand the usage of their facilities and provide new recommendations about how programming resources should be allocated within their service areas and what programs they should hold.
#windowsFonts(font = windowsFont('Helvetica'))
crs = 4326
library(vroom)
library(grid)
library(gridExtra)
library(sf)
library(terra)
library(dplyr)
library(spData)
library(mapview)
library(geosphere)
library(sp)
library(rgeos)
library(ggplot2)
library(ggmap)
library(kableExtra)
library(tidyverse)
library(data.table)
#remotes::install_github("CityOfPhiladelphia/rphl")
library(rphl)
library(lubridate)
library(furrr)
library(tidycensus)
library(rgdal)
library(furrr)
library(mapview)
library(NbClust)
library(cluster)
library(psych)
ll <- function(dat, proj4 = 4326){st_transform(dat, proj4)}
census_api_key("b33ec1cb4da108659efd12b3c15412988646cbd8", overwrite = TRUE)
root.dir = "https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/DATA/"
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")
plotTheme <- function(base_size = 9, title_size = 10){
theme(
text = element_text( color = "black"),
plot.title = element_text(size = title_size, colour = "black", hjust = 0.5),
plot.subtitle = element_text( face = 'italic',
size = base_size, colour = "black", hjust = 0.5),
plot.caption = element_text( hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line("grey80", size = 0.01),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=.5),
strip.background = element_blank(),
strip.text = element_text( size=9),
axis.title = element_text( size=9),
axis.text = element_text( size=7),
axis.text.y = element_text( size=7),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text( colour = "black", face = "italic", size = 9),
legend.text = element_text( colour = "black", face = "italic", size = 7),
strip.text.x = element_text( size = 9),
legend.key.size = unit(.5, 'line')
)
}
mapTheme <- function(base_size = 9, title_size = 10){
theme(
text = element_text( color = "black"),
plot.title = element_text(size = title_size, colour = "black", hjust = 0.5),
plot.subtitle = element_text( face = 'italic',
size = base_size, colour = "black", hjust = 0.5),
plot.caption = element_text( hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
strip.background = element_blank(),
strip.text = element_text(size=base_size),
axis.title = element_text( size=9),
axis.text = element_blank(),
axis.text.y = element_blank(),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text( colour = "black", face = "italic", size = 9),
legend.text = element_text( colour = "black", face = "italic", size = 7),
strip.text.x = element_text(size=base_size),
legend.key.size = unit(.5, 'line')
)
}
palette10 <- c("#315d7f","#4f5d7f","#6d5c7e","#a36681","#d96f83","#f2727f","#f6928a","#f8a28f","#f9b294","#fcdc97")
palette9 <- c('#315d7f', '#4f5d7f', '#6d5c7e', '#a36681', '#d96f83', '#f2727f', '#f6928a', '#f8a28f', '#f9b294')
palette7 <- c('#315d7f', '#4f5d7f', '#6d5c7e', '#d96f83', '#f2727f', '#f6928a', '#f9b294')
palette5 <- c("#f9b294","#f2727f","#c06c86","#6d5c7e","#315d7f")
palette4 <- c("#f9b294","#f2727f","#c06c86","#6d5c7e")
palette2 <- c("#f9b294","#f2727f")
palette1_main <- "#F2727F"
palette1_assist <- '#F9B294'
pprDistrict <- st_read('https://opendata.arcgis.com/datasets/0cdc4a1e86c6463b9600f9d9fca39875_0.geojson') %>%
st_transform(crs)
pprServiceArea <- read_sf(dsn="data/FromPPR/PPR_Service_Areas_2021/PPR_Service_Areas_2021.shp")%>%
st_transform(crs = crs)
base_map <- get_map(location = unname(st_bbox(ll(st_buffer(st_union(pprDistrict),5000)))),maptype = "terrian")
ggmap(base_map) +
geom_sf(data=ll(st_union(pprDistrict)),color="black",size=1,fill = "transparent",inherit.aes = FALSE)+
geom_sf(data=ll(pprDistrict),color='black',size=1,fill = "transparent",inherit.aes = FALSE)+
geom_sf(data=ll(pprServiceArea),color='black',size=0.6,fill = "transparent",inherit.aes = FALSE)+
geom_sf(data=ll(pprDistrict %>% filter(DISTRICTID %in% c(7,8,9))),color=palette1_main,size=1,fill = "transparent",inherit.aes = FALSE)+
geom_sf(data=ll(pprServiceArea %>% filter(PPR_DIST %in% c(7,8,9))),color=palette1_main,size=0.6,fill = "transparent",inherit.aes = FALSE)+
labs(title = "",
subtitle = "",
x="",y="")+
mapTheme()
Philadelphia Parks and Recreation divides the city into 10 districts. Each district contains service areas. Each service area contains facilities like parks and rec centers. Each service area also has staff members who can run programs at specific facilities. Besides macro analyses, this practicum mainly focuses on the District 7、8、9, The areas bounded by pink lines in the Figure1.1.1 are the services areas in District 7,8,9.
pprProperties <- st_read('https://opendata.arcgis.com/datasets/d52445160ab14380a673e5849203eb64_0.geojson')%>%
st_transform(crs)
property <- st_join(st_centroid(pprProperties),pprServiceArea,left=TRUE) %>%
st_drop_geometry() %>%
left_join(pprProperties %>% dplyr::select(OBJECTID,geometry),by='OBJECTID') %>%
st_sf() %>%
st_transform(crs = crs) %>%
dplyr::select(-Shape__Length,-Shape__Area,-Shape_Leng,-Shape_Area) %>%
rename('ServiceAreaID' = 'INFO')
ggplot() +
geom_sf(data=property,color=palette1_main,fill = palette1_main)+
geom_sf(data=st_union(pprDistrict),color="black",size=1,fill = "transparent")+
geom_sf(data=pprDistrict,color="black",size=0.7,linetype ="dashed",fill = "transparent")+
geom_sf(data=pprDistrict %>% filter(DISTRICTID %in% c(7,8,9)),color="black",size=1,fill = "transparent")+
labs(title = "",
subtitle = "",
x="",y="")+
mapTheme()
The Figure1.2.1 is the map of their respective locations.PPR administrates more than 500 facilities within Philadelphia boundary. Notably, for some huge parks, PPR divides them into several sub-facilities. For example, ‘Wissahickon Valley Park’ is divided into 16 sub-facilities (‘Wissahickon Valley Park’, ‘Wissahickon Environmental Center’, ‘Salvatore Pachella Memorial Field’, ‘David P Montgomery Field’, ‘John F Boyce Memorial Field’, ‘Arrow Field’, ‘Walnut Lane Golf Club’, ‘Samuel F Houston Playground’, ‘Carpenters Woods’, ‘Dodge Tract’, ‘Historic Rittenhouse Town’, ‘Clifford Park’, ‘Blue Bell Park’, ‘Saul High School Farm’, ‘Andorra Natural Area’,and ‘Saylors Grove’).
The activities hold in PPR properties are recorded in two systems: the program schedules and the permit. With the former, the PPR staffing arrange many activities seasonally in different PPR facilities. With the latter, people outside the PPR apply for conducting activities and reserve space in Parks & Recreation areas.
In 2021, there are 4376 times of activities recorded in program schedules, covering 7 categories of After School, Athletic, Camp, Cultural, Educational, Environmental/Outdoor, and other activities, while 3861 times of activities are recorded in the permit system. In the following analyses, events refer to the combination of the programs and the permits.
permit2021 <- vroom("data/FromPPR/PPR-recreation-permits-2021.csv")
program2021 <- vroom("data/FromPPR/PPR-programs-attended-2021-with-schedules.csv")
facilityID <- read.csv("data/FromPPR/tblFacility_to_PPR_Properties.csv")
# define date, filter by attendance date
program2021.clean <- program2021 %>%
mutate(AttendanceWeekDate = mdy(AttendanceWeekDate),
DateFrom = mdy(DateFrom),
DateTo = mdy(DateTo)) %>%
filter(AttendanceWeekDate > DateFrom & AttendanceWeekDate < DateTo)
# original data is recorded by week, here we change it into being recorded by occurence
program2021.clean <- separate(program2021.clean, Days,into= c("1","2","3","4","5","6","7"))
program2021.clean <- program2021.clean %>%
gather(colNames, weekday, 15:21) %>%
select(-colNames) %>%
na.omit(cols='weekday')%>%
mutate(AttendenceRealDate = case_when(
weekday == "Monday" ~ AttendanceWeekDate,
weekday == "Tuesday" ~ AttendanceWeekDate+1,
weekday == "Wednesday" ~ AttendanceWeekDate+2,
weekday == "Thursday" ~ AttendanceWeekDate+3,
weekday == "Friday" ~ AttendanceWeekDate+4,
weekday == "Saturday" ~ AttendanceWeekDate+5,
weekday == "Sunday" ~ AttendanceWeekDate+6,
))
# join property,permit and program data
program2021.join <- left_join(program2021.clean, facilityID, by =c("FacilityID" = "FacilityID")) %>%
left_join(., property, by =c("PPR_Properties_ObjectID"="OBJECTID"))
permit2021.join <- left_join(permit2021, facilityID, by =c("FacilityID")) %>%
left_join(., property, by =c("PPR_Properties_ObjectID"="OBJECTID"))
# filter the failed joining items
program2021.otherDIstrict <- program2021.join %>% filter(is.na(PPR_Properties_ObjectID))
program2021.join <- program2021.join %>% drop_na(PPR_Properties_ObjectID)
permit2021.otherDIstrict <- permit2021.join %>% filter(is.na(PPR_Properties_ObjectID))
permit2021.join <- permit2021.join %>% drop_na(PPR_Properties_ObjectID)
# Wrangle "program2021.join", and extract month attendance
Facility_Program <- program2021.join %>%
select(Facility,ActvityTypeCategory,ActivityType,
AttendanceWeekDate,
RegisteredIndividualsCount,
PPR_DISTRICT, PUBLIC_NAME, PARENT_NAME,geometry) %>%
mutate(month = case_when(month(AttendanceWeekDate)==1 ~ "Jan",
month(AttendanceWeekDate)==2 ~ "Feb",
month(AttendanceWeekDate)==3 ~ "Mar",
month(AttendanceWeekDate)==4 ~ "Apr",
month(AttendanceWeekDate)==5 ~ "May",
month(AttendanceWeekDate)==6 ~ "Jun",
month(AttendanceWeekDate)==7 ~ "Jul",
month(AttendanceWeekDate)==8 ~ "Aug",
month(AttendanceWeekDate)==9 ~ "Sep",
month(AttendanceWeekDate)==10 ~ "Oct",
month(AttendanceWeekDate)==11 ~ "Nov",
month(AttendanceWeekDate)==12 ~ "Dec")) %>%
distinct(.keep_all = FALSE)
Facility_Program_otherDistricts <- program2021.otherDIstrict %>%
select(Facility,ActvityTypeCategory,ActivityType,
AttendanceWeekDate,
RegisteredIndividualsCount,
PUBLIC_NAME, PARENT_NAME,geometry) %>%
mutate(month = case_when(month(AttendanceWeekDate)==1 ~ "Jan",
month(AttendanceWeekDate)==2 ~ "Feb",
month(AttendanceWeekDate)==3 ~ "Mar",
month(AttendanceWeekDate)==4 ~ "Apr",
month(AttendanceWeekDate)==5 ~ "May",
month(AttendanceWeekDate)==6 ~ "Jun",
month(AttendanceWeekDate)==7 ~ "Jul",
month(AttendanceWeekDate)==8 ~ "Aug",
month(AttendanceWeekDate)==9 ~ "Sep",
month(AttendanceWeekDate)==10 ~ "Oct",
month(AttendanceWeekDate)==11 ~ "Nov",
month(AttendanceWeekDate)==12 ~ "Dec")) %>%
distinct(.keep_all = FALSE)
# Aggregate weekly visites
WeekVisit <- aggregate(
RegisteredIndividualsCount ~ AttendanceWeekDate + ActvityTypeCategory + PPR_DISTRICT, data = Facility_Program, FUN = sum)
Through above wrangling, we obtain information like the duration of the events, the attendance of the events etc. Furthermore, we link program and permit datasets to their based properties,their belonged Service Area, which is shown in the following map.
ggplot()+
geom_sf(data=pprServiceArea %>% filter(PPR_DIST %in% c(7,8,9)),color='black',size=0.25,linetype ="dashed", fill= "transparent")+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==7)),color="black",size=1,fill = "transparent")+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==8)),color="black",size=1,fill = "transparent")+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==9)),color="black",size=1,fill = "transparent")+
geom_sf(data=permit2021.join,aes(geometry = geometry),color =palette1_assist,fill = palette1_assist, alpha = 0.7) +
geom_sf(data=program2021.join,aes(geometry = geometry),color = palette1_main,fill = palette1_main,alpha = 0.7)+
labs(title="")+
mapTheme()
In the figure2.1.1 above, red legends show the locations of facilities with program schedules while orange legends show the distributions of facilities with permit records. From the map, we can see that permits cover larger area than programs which indicates the demand exceeds PPR programming scope and the ways to improve PPR service.
One limitation should be pointed out here is that some facilities didn’t spatial joined the geographic information successfully, so, there are more facilities with program and permit compared to the number showing on the figure.
There are three facilities with program scheduled in District 7: Athletic Recreation Center, Mander Playground, and Marian Anderson Recreation Center.
ggplot()+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==7)),color="black",size=1,fill = "transparent")+
geom_sf(data=pprServiceArea %>% filter(PPR_DIST ==7),color="black",linetype ="dashed",size=1,fill = "transparent")+
geom_sf(data=permit2021.join%>% filter(PPR_DIST ==7),aes(geometry = geometry),color =palette1_assist,fill = palette1_assist, alpha = 0.7) +
geom_sf(data=program2021.join%>% filter(PPR_DIST ==7),aes(geometry = geometry),color = palette1_main,fill = palette1_main,alpha = 0.7)+
labs(title="")+
mapTheme()
In the Figure2.2.3, we can see that Marian Anderson Recreation Center hold more activities in athletic category, like soccer and baseball. The other two facilities hold more cultural activities, like art and music.
plot1 <- ggplot(Facility_Program %>%filter(PPR_DISTRICT == 7)) +
geom_bar(aes(x= Facility,fill = ActvityTypeCategory),position="stack")+
scale_fill_manual(values = palette5)+
labs(y = "Program Frequency", fill="Category", title = "")+
plotTheme()+
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1),
legend.text = element_text( colour = "black", face = "italic", size = 4))
plot2 <- ggplot(Facility_Program %>%filter(PPR_DISTRICT == 7)) +
geom_bar(aes(x= Facility, fill = ActivityType),position="stack")+
scale_fill_manual(values = palette7)+
labs(y = "Program Frequency", fill="sub-Category", title = "")+
plotTheme()+
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1),
legend.text = element_text( colour = "black", face = "italic", size = 4))
grid.arrange(plot1, plot2,ncol=2,top = "")
ggplot(WeekVisit %>%filter(PPR_DISTRICT == 7)) +
geom_line(size=0.5,aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory)) +
geom_point(aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory, size = RegisteredIndividualsCount)) +
scale_color_manual(values = palette5)+
scale_size_continuous(range = c(2, 4))+
labs(title = "",
color = "Program Category",
size="Visitor Counts",
x = "Week of the Year",
y = "Visitor Counts")+
plotTheme()
In the Figure2.2.4, we can see that Athletic activities were hold from March to November while cultural and after school activities mostly took place in fall.
There are five facilities with program scheduled in District 8: 48th & Woodland Playground, Christy Recreation Center, Howards S. Morris Recreation Center, Laura Sims Skate House, and Shepard Recreation Center.
ggplot()+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==8)),color="black",size=1,fill = "transparent")+
geom_sf(data=pprServiceArea %>% filter(PPR_DIST ==8),color="black",linetype ="dashed",size=1,fill = "transparent")+
geom_sf(data=permit2021.join%>% filter(PPR_DIST ==8),aes(geometry = geometry),color =palette1_assist,fill = palette1_assist, alpha = 0.7) +
geom_sf(data=program2021.join%>% filter(PPR_DIST ==8),aes(geometry = geometry),color = palette1_main,fill = palette1_main,alpha = 0.7)+
labs(title="")+
mapTheme()
In Figure2.2.6 we can see that Laura Sims Skate House hold hockey and ice skating activities from October to May the next year, while Morris Recreation Center hosted more cultural activities like dance as well as athletic activities of gymnastics, tumbling and basketball from October to December.
plot1 <- ggplot(Facility_Program %>%filter(PPR_DISTRICT == 8)) +
geom_bar(aes(x= Facility,fill = ActvityTypeCategory),position="stack")+
scale_fill_manual(values = palette5)+
labs(y = "Program Frequency", fill="Category", title = "")+
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom",
legend.text = element_text( colour = "black", face = "italic", size = 4))
plot2 <- ggplot(Facility_Program %>%filter(PPR_DISTRICT == 8)) +
geom_bar(aes(x= Facility, fill = ActivityType),position="stack")+
scale_fill_manual(values = palette9)+
labs(y = "Program Frequency", fill="sub-Category", title = "")+
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom",
legend.text = element_text( colour = "black", face = "italic", size = 4))
grid.arrange(plot1, plot2,ncol=2,top = "")
ggplot(WeekVisit %>%filter(PPR_DISTRICT == 8)) +
geom_point(aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory, size = RegisteredIndividualsCount)) +
geom_line(size=0.5,aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory)) +
scale_color_manual(values = palette5)+
scale_size_continuous(range = c(2, 4))+
labs(title = "",
color = "Program Category",
size="Visitor Counts",
x = "Week of the Year",
y = "Visitor Counts")+
plotTheme()
In Figure2.2.7 we can see that Hockey and ice skating activities take place in fall, winter and spring, while cultural and after school are mainly hold in fall.
There are seven facilities with program scheduled in District 9: Barry Playground and Pool, Cibotti Recreation Center, DiSilvestro Playground, East Passyunk Community Center, Eastwick Regional Playground, and Thomas B. Smith Recreation Center.
ggplot()+
geom_sf(data=st_union(pprServiceArea %>% filter(PPR_DIST ==9)),color="black",size=1,fill = "transparent")+
geom_sf(data=pprServiceArea %>% filter(PPR_DIST ==9),color="black",linetype ="dashed",size=1,fill = "transparent")+
geom_sf(data=permit2021.join%>% filter(PPR_DIST ==9),aes(geometry = geometry),color =palette1_assist,fill = palette1_assist, alpha = 0.7) +
geom_sf(data=program2021.join%>% filter(PPR_DIST ==9),aes(geometry = geometry),color = palette1_main,fill = palette1_main,alpha = 0.7)+
labs(title="")+
mapTheme()
In Figure2.2.9 we can see that Athletic activities of basketball and aquatics mostly took place in Barry Playground and Pool and East Passyunk Community Center. Eastwick Regional Playground, and Thomas B. Smith Recreation Center are the most two popular facilities with activities of athletic, after school, cultural, educational, and other categories.
ggplot(Facility_Program %>%filter(PPR_DISTRICT == 9)) +
geom_bar(aes(x= Facility,fill = ActvityTypeCategory),position="stack")+
scale_fill_manual(values = palette7[2:7])+
labs(y = "Program Frequency", fill="Category", title = "")+
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
ggplot(WeekVisit %>%filter(PPR_DISTRICT == 9)) +
geom_point(aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory, size = RegisteredIndividualsCount)) +
geom_line(size=0.5,aes(x = AttendanceWeekDate, y = RegisteredIndividualsCount, group = ActvityTypeCategory, colour = ActvityTypeCategory)) +
scale_color_manual(values = palette7[2:7])+
scale_size_continuous(range = c(2, 4))+
labs(title = "",
color = "Program Category",
size="Visitor Counts",
x = "Week of the Year",
y = "Visitor Counts")+
plotTheme()
In Figure2.2.9 we can see that Athletic activities took place throughout the whole year, while other category of activities, like older adults and mentoring, were hold in the 2nd half of the year. Cultural activities, like art, dance, music, usually suspended in summer.
ggplot(Facility_Program_otherDistricts) +
geom_bar(aes(x= Facility,fill = ActvityTypeCategory),position="stack")+
scale_fill_manual(values = palette7)+
labs(y = "Program Frequency", fill="Program Category", title = "")+
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
In Figure2.3.1 we can see that There are 27 facilities with program scheduled in other districts of PPR serving areas. We will have further analysis after PPR provide specific sorting information.
This practicum aims to use SafeGraph’s Pattern dataset to analyze if PPR programmings achieve their goals of meeting citizens’ needs and serving them well. Further, SafeGraph data can be used to suggest PPR’s future programming strategies in Philadelphia.
SafeGraph’s Patterns dataset includes visitor and demographic aggregations for points of interest (POIs) in the US. This contains aggregated raw counts of visits to POIs from a panel of mobile devices, answering how often people visit, how long they stay, where they came from, where else they go, and more. More Information
# brand_info <- vroom("data/safegraph/Philadelphia-Camden-WilmingtonPA-NJ-DE-MDMSA-CORE_POI-2021_11-2021-12-17/brand_info.csv")
# core_poi <- vroom("data/safegraph/Philadelphia-Camden-WilmingtonPA-NJ-DE-MDMSA-CORE_POI-2021_11-2021-12-17/core_poi.csv")
#
# monthList = c("01","02","03","04","05","06","07","08","09","10","11")
#
# patternAllMonth = data.frame()
# for (i in monthList){
# currentMonth = vroom(paste("data/safegraph/SafeGraph Data Purchase Dec-16-2021/Philadelphia-Camden-WilmingtonPA-NJ-DE-MDMSA-PATTERNS-2021_",
# i,
# "-2021-12-17/patterns.csv.gz",sep = ""))%>%
# filter(region=="PA")%>%
# mutate(month=paste(i,sep = ""))
# patternAllMonth = rbind(patternAllMonth,currentMonth)
# }
#
# # filter into philly, join with POI data
# safeGraph <- patternAllMonth %>%
# filter(city == "Philadelphia")%>%
# left_join(core_poi %>% dplyr::select(placekey,location_name,top_category,sub_category,naics_code,latitude,longitude),
# by=c("placekey"="placekey","location_name" = "location_name"),keep=FALSE)
#
# # create geometry from lat & lng
# safeGraph.geo <-
# safeGraph %>%
# st_as_sf(coords = c("longitude", "latitude"), crs = crs, agr = "constant", na.fail=FALSE)
# patternAllMonth <- st_read("data/output/patternAllMonth.csv")
#st_write(safeGraph.geo,"data/output/safeGraph.geo.GeoJSON")
safeGraph.geo <- st_read("data/output/safeGraph.geo.GeoJSON",crs = crs)
# change workers number by yourself
plan(multiprocess, workers = 10)
# keep congeneric bussiness
congenericMoves <-
safeGraph.geo %>%
filter(top_category %in% c("Promoters of Performing Arts, Sports, and Similar Events","Other Amusement and Recreation Industries","Museums, Historical Sites, and Similar Institutions") | str_detect(location_name, "Park") | str_detect(location_name, "Playground") | str_detect(location_name, "Recreation Center")) %>%
filter(str_detect(location_name, "Parking", negate = TRUE))
# Keep only ppr sites
parks <- safeGraph.geo %>%
dplyr::select(placekey, naics_code, location_name) %>%
distinct() %>%
filter(naics_code %in% c(712190, 713990, 713940, 711310) | str_detect(location_name, "Park") | str_detect(location_name, "Playground") | str_detect(location_name, "Recreation Center")) %>%
filter(str_detect(location_name, "Parking", negate = TRUE)) %>%
st_transform(crs = 4326)
PPRmoves <- safeGraph.geo %>%
filter(placekey %in% as.list(parks$placekey))
In the above wrangling, we first combine all 11 month pattern data from SafeGraph, attaching them with geometry information. Further, we filter the data into Pennsylvania region and into Practicum-related POIs using NAICS code. The NAICS codes we chose are as follow.
712190: Nature Parks and Other Similar Institutions;
713990: All Other Amusement and Recreation Industries;
713940: Fitness and Recreational Sports Centers;
711310:Promoters of Performing Arts, Sports, and Similar Events
# join filtered safeGraph place with ppr property
propertyWithPlaceKey <- st_join(property %>% st_buffer(10),parks %>% dplyr::select(placekey, geometry),left=FALSE) %>%
st_drop_geometry() %>%
left_join(property %>% dplyr::select(OBJECTID),by=('OBJECTID'='OBJECTID')) %>%
st_sf() %>%
st_transform(crs=crs) %>%
drop_na(placekey)
# join filtered safeGraph place with ppr programs
program2021.joinWithPlaceKey <-
st_join(program2021.join %>%
st_sf() %>%
st_transform(crs=crs) %>%
st_buffer(10),
parks %>% dplyr::select(placekey, geometry),left=FALSE) %>%
st_drop_geometry() %>%
merge.data.frame(program2021.join %>%
dplyr::select(geometry),
by='row.names')%>%
dplyr::select(-Row.names) %>%
st_sf() %>%
st_transform(crs=crs)
Through above wrangling, the filtered SafeGraph POIs were linked with PPR facilities and PPR programmings&permits, using spatial join. So far we have successfully bridged the channel of communicating among these three main datasets. In the above map, the purple polygons are the all properties of PPR, the green dots are the successfully joined properties with placekey. From the map we can believe the join is sucessfully, it covers all most all PPR properties.
# unnest visit Count data
# visitCount <-
# PPRmoves %>%
# select(placekey, date_range_start, date_range_end, visits_by_day) %>%
# mutate(date_range_start = as_date(date_range_start),
# date_range_end = as_date(date_range_end)) %>%
# dplyr::select(-date_range_end) %>%
# mutate(visits_by_day = future_map(visits_by_day, function(x){
# unlist(x) %>%
# as_tibble() %>%
# rowid_to_column(var = "day") %>%
# mutate(visits = value) %>%
# dplyr::select(-value)
# })) %>%
# unnest(cols = c("visits_by_day"))
# visitCount <- visitCount %>%
# rename(visitDay = date_range_start) %>%
# mutate(visitDay = day+visitDay-1) %>%
# mutate(month = month(visitDay))
# st_write(visitCount,"data/output/visitCount.GeoJSON")
visitCount <- st_read("data/output/visitCount.GeoJSON",crs=crs)
# unnest popularity_by_hour data
# visitHour <-
# PPRmoves %>%
# select(placekey, popularity_by_hour, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(popularity_by_hour = future_map(popularity_by_hour, function(x){
# unlist(x) %>%
# as_tibble() %>%
# rowid_to_column(var = "hour") %>%
# rename(visit = value)
# }))%>%
# unnest(popularity_by_hour)
#
# st_write(visitHour,"data/output/visitHour.GeoJSON")
visitHour <- st_read("data/output/visitHour.GeoJSON",crs=crs)
# unnest the origin-destination data
# originCount <-
# PPRmoves %>%
# select(placekey, visitor_home_cbgs, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(visitor_home_cbgs = future_map(visitor_home_cbgs, function(x){
# jsonlite::fromJSON(x) %>%
# as_tibble() %>%
# dplyr::select(starts_with("4")) %>%
# gather()
# })) %>%
# unnest(visitor_home_cbgs) %>%
# rename(origin =key ,visitors= value)
#
# st_write(originCount,"data/output/originCount.GeoJSON")
originCount <- st_read("data/output/originCount.GeoJSON",crs=crs)
#unnest the departure - destination data
# departCount <-
# PPRmoves %>%
# select(placekey, visitor_daytime_cbgs, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(visitor_daytime_cbgs = future_map(visitor_daytime_cbgs, function(x){
# jsonlite::fromJSON(x) %>%
# as_tibble() %>%
# dplyr::select(starts_with("4")) %>%
# gather()
# }))%>%
# unnest(visitor_daytime_cbgs) %>%
# rename(departure =key ,visitors= value)
#
# st_write(departCount,"data/output/departCount.GeoJSON")
departCount <- st_read("data/output/departCount.GeoJSON",crs=crs)
# unnest the bucketed_dwell_times data
# dwellTime <-
# PPRmoves %>%
# select(placekey, bucketed_dwell_times, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(bucketed_dwell_times = future_map(bucketed_dwell_times, function(x){
# jsonlite::fromJSON(x) %>%
# as_tibble() %>%
# gather()
# }))%>%
# unnest(bucketed_dwell_times) %>%
# rename(dwellTimes =key ,visitors= value)
#
# st_write(dwellTime,"data/output/dwellTime.GeoJSON")
dwellTime <- st_read("data/output/dwellTime.GeoJSON",crs=crs)
# unnest the related_same_day_brand
# relatedBrand <-
# PPRmoves %>%
# select(placekey, related_same_day_brand, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(related_same_day_brand = future_map(related_same_day_brand, function(x){
# jsonlite::fromJSON(x) %>%
# as_tibble() %>%
# gather()
# }))
#
# relatedBrand <- relatedBrand %>%
# unnest(related_same_day_brand) %>%
# rename(relatedBrand =key ,visitors= value)
#
# st_write(relatedBrand,"data/output/relatedBrand.GeoJSON")
relatedBrand <- st_read("data/output/relatedBrand.GeoJSON",crs=crs)
# unnest the popularity_by_day data
# visitWeekday <-
# PPRmoves %>%
# select(placekey, popularity_by_day, date_range_start) %>%
# mutate(date_range_start = as_date(date_range_start),
# month = month(date_range_start)) %>%
# dplyr::select(-date_range_start) %>%
# mutate(popularity_by_day = future_map(popularity_by_day, function(x){
# jsonlite::fromJSON(x) %>%
# as_tibble() %>%
# gather()
# }))%>%
# unnest(popularity_by_day) %>%
# rename(visitWeekday =key ,visits= value)
#
# st_write(visitWeekday,"data/output/visitWeekday.GeoJSON")
visitWeekday <- st_read("data/output/visitWeekday.GeoJSON",crs=crs)
Because SafeGraph data is nested within the dataframe. To further use this data into analyses, we have to unsent them into respective dataframes. Through above wrangling, we obtain information like number of visits to these facilities by day, the number of visits to these facilities by hour, the number of visits to these facilities by weekday, the census block groups of these visitors’ home, the census block groups of these visitors’ departure places, the distribution of these visits’ dwelling time etc.
Notably, the visits here refer to the visits in SafeGraph Mobile Device Panel instead of the true visits to these facilities. Even though there might be some biases in the representatives, we still believe this data can give us insights about the pattern of actual visits. However, some data rectifications are still deployed in the following to make it more reliable.
Sometimes, SafeGraph will count “Residents” or “passer-bys” of the position in “visits”, which will result in huge bias and mislead the analysis. So, to conclude typical position types of SafeGraph data bias, we choose midnight visitsand dwelling time as factors for cluster analysis. We also apply principal component analysis here to assist in understanding characteristics of clusters.
Taking results of cluster analysis , most locations, 1094 out of 1117, belong to cluster 3 with reliable data. Cluster 2 includes 14 places with extremely more night visits and long visits that are more than 2 hours. Cluster 1 consists of 9 locations with many transitory visits that are less than 10 minutes.
# calculate midnight visit
visitHourNight <- visitHour %>%
filter(hour >= 1 | hour <=5) %>%
group_by(placekey) %>%
summarize(nightVisit = sum(visit))
# integrate sg data by midnight visit and dwell time
visitIntegrated <- full_join(visitHourNight %>%
st_drop_geometry(),
dwellTime %>%
group_by(placekey, dwellTimes) %>%
summarize(visitors = sum(visitors)),
by=c('placekey'))
# wide format
visitIntegrated <- visitIntegrated %>%
spread(key = dwellTimes, value = visitors)
# export
# st_write(visitIntegrated, "data/output/visitIntegrated.GeoJSON")
# import
visitIntegrated <- st_read("data/output/visitIntegrated.GeoJSON") %>%
rename(c("Dwell<5"="X.5","Dwell>240"="X.240","Dwell11-20"="X11.20","Dwell121-240"="X121.240","Dwell21-60"="X21.60","Dwell5-10"="X5.10","Dwell61-120"="X61.120" ))
visitIntegrated <- visitIntegrated[,c(1,2,3,8,5,7,9,6,4)]
# normalize
visitIntegrated.scale <- scale(visitIntegrated %>%
dplyr::select(-placekey) %>%
st_drop_geometry())
set.seed(1234)
# decide cluster number (only run once)
#nc <- NbClust(visitIntegrated.scale, min.nc=2, max.nc=15, method="kmeans", index="all")
#table(nc$Best.n[1,])
#
# barplot(table(nc$Best.n[1,]),
# xlab="Numer of Clusters", ylab="Number of Criteria",
# main="Number of Clusters Chosen by 26 Criteria")
# Run K-Means cluster
cluster1 <- kmeans(visitIntegrated.scale, 3)
#summary(cluster1)
# add cluster number back
visitIntegrated <- visitIntegrated %>%
mutate(cluster = cluster1$cluster)
# mean by cluster
cluster1_mean <- aggregate(visitIntegrated %>%
dplyr::select(-placekey, -cluster) %>%
st_drop_geometry(),
by=list(cluster=visitIntegrated$cluster),
FUN=mean) %>%
left_join(visitIntegrated %>%
st_drop_geometry() %>%
group_by(cluster) %>%
summarize(size = n()),
by="cluster")
kable(cluster1_mean,align = 'c',caption = '<center>Table 1. Mean values of clusters for SafeGraph data <center/>') %>%
kable_classic(full_width = F)%>%
kable_styling(position = "center")%>%
scroll_box(width = "100%", height = "400px")
## put histograms on the diagonal
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, ...)
}
#Color points by groups
my_cols <- c(palette5[5],palette5[4],palette5[1])
pairs(visitIntegrated %>%
st_drop_geometry() %>%
mutate(`Dwel<10`=`Dwell<5`+`Dwell5-10`,
`Dwell11-120` = `Dwell11-20` + `Dwell21-60` + `Dwell61-120`,
`Dwell>120` = `Dwell121-240` + `Dwell>240`) %>%
dplyr::select(nightVisit, `Dwel<10`, `Dwell11-120`, `Dwell>120`),
pch = 19, cex = 0.5, cex.labels=1, diag.panel = panel.hist,
col = my_cols[visitIntegrated$cluster],
lower.panel=NULL, panel = panel.smooth)
We can visualize these clusters more simply with the help of principal component analysis. It can be seen that cluster 3, normal data, has a large number and is concentrated, while the problematic clusters 2 and 3 have a smaller number and a larger distribution range.
x <- clusplot(visitIntegrated.scale,
cluster1$cluster, color=TRUE, shade=TRUE, main = "",
labels=5, lines=0, stand=TRUE, col.txt=palette5[1:3], col.clus=palette5[1:3], col.p=palette5[5])
Here’s how two components are made up. Basically, the visit amount during the midnight or last more than 20 minutes heavily contribute to component 1 (overestimate short visits), while short visits that are less than 10 minutes more contribute to component 2 (miscount resident as visitors). So many analysis in this report has grouped dwell time into 3 span, <10min 11-120 min, and >120min.
# decide component number - 2 (only run once)
# fa.parallel(as.data.frame(visitIntegrated.scale), fa = 'pc', n.iter = 100, show.legend = FALSE)
# principal component analysis
set.seed(1234)
cluster1_pca <- principal(visitIntegrated.scale, nfactors = 2, rotate = "none", scores = TRUE)
options(digits = 2)
kable(t(cluster1_pca$Structure[1:8,1:2]),align = 'c',caption = '<center>Table 3.1.1 Correlations between variables and components <center/>') %>%
kable_classic(full_width = F)%>%
kable_styling(position = "center")
nightVisit | Dwell<5 | Dwell5-10 | Dwell11-20 | Dwell21-60 | Dwell61-120 | Dwell121-240 | Dwell>240 | |
---|---|---|---|---|---|---|---|---|
PC1 | 0.89 | 0.64 | 0.64 | 0.79 | 0.99 | 0.95 | 0.92 | 0.88 |
PC2 | -0.44 | 0.75 | 0.75 | 0.57 | 0.01 | -0.29 | -0.38 | -0.47 |
Based on the bias analysis above, we rectify SafeGraph data depending on its cluster / type. First, all visit counts with dwell time that is less than 5 minutes are removed. Also, for cluster 1, visit counts with dwell time that is between 5 and 10 minutes will be reduced referring to the related proportion in the normal cluster 3. On top of that, long visits that are longer than 2 hours in Cluster 2 will be trimmed into the appropriate ratio according to midnight visitors which can be treated as “residents”.
# cluster info
cluster <- visitIntegrated %>%
st_drop_geometry() %>%
dplyr::select(placekey, cluster)
# caculate the ratio to trim "Dwell5-10" for cluster 1
Sum <- visitIntegrated %>%
st_drop_geometry() %>%
mutate(Total = `Dwell<5`+`Dwell5-10`+`Dwell11-20`+`Dwell21-60`+`Dwell61-120`+`Dwell121-240`+`Dwell>240`) %>%
group_by(cluster) %>%
summarize(`Dwell5-10` = sum(`Dwell5-10`),
`Dwell<5` = sum(`Dwell<5`),
Total = sum(Total))
ratio5To10 <- Sum[3,]$`Dwell5-10`/Sum[3,]$Total
group1.dwell <- dwellTime %>%
st_drop_geometry() %>%
spread(key=dwellTimes, value = visitors) %>%
left_join(visitIntegrated %>% dplyr::select(placekey, cluster) %>% st_drop_geometry(), by="placekey") %>%
filter(cluster==1) %>%
mutate(Total = `<5`+`5-10`+`11-20`+`21-60`+`61-120`+`121-240`+`>240`,
`rectified5-10`=ifelse((`5-10`-ratio5To10*Total/(ratio5To10-1)/`5-10`)<=`5-10`,
`5-10`-ratio5To10*Total/(ratio5To10-1)/`5-10`,
`5-10`),
`rectified5-10rate`=`rectified5-10`/Total,
`<5rate`=`<5`/Total)
group23.dwell <- dwellTime %>%
st_drop_geometry() %>%
spread(key=dwellTimes, value = visitors) %>%
mutate(Total = `<5`+`5-10`+`11-20`+`21-60`+`61-120`+`121-240`+`>240`,
`<5rate`=`<5`/Total)
# caculate the ratio to trim "Dwell121-240" and "Dwell>240" for cluster 2
ratio2h <- visitHour %>%
st_drop_geometry() %>%
spread(key = hour, value = visit) %>%
left_join(visitIntegrated %>%
st_drop_geometry() %>%
dplyr::select(placekey, cluster)) %>%
mutate(residents = (`1`+`2`+`3`+`4`+`5`)/5,
hourlySum = (`1`+`2`+`3`+`4`+`5`+`6`+`7`+`8`+`9`+`10`+`11`+`12`+`13`+`14`+`15`+`16`+`17`+`18`+`19`+`20`+`21`+`22`+`23`+`24`),
`ratio>2h` = 16.25*residents/hourlySum) %>%
dplyr::select(placekey, month, `ratio>2h`, cluster, residents)
# rectify 'visitHour'
visitHour <- visitHour %>%
left_join(ratio2h, by=c("placekey","month"))
visitHour <- visitHour %>%
filter(cluster==2) %>%
filter(hour<=5) %>%
mutate(visit = ifelse(visit-residents*0.9<0,
visit*0.05,
visit-residents*0.9)) %>%
rbind(
visitHour %>%
filter(cluster==2) %>%
filter(hour>=6 & hour<=10) %>%
mutate(visit=ifelse(visit-residents*(0.9-0.115*(hour-5))<0,
visit*0.05,
visit-residents*(0.9-0.115*(hour-5))))
) %>%
rbind(
visitHour %>%
filter(cluster==2) %>%
filter(hour>=11 & hour<=18) %>%
mutate(visit=ifelse(visit-residents*0.25*20/24<0,
visit*0.05,
visit-residents*0.25*20/24))
) %>%
rbind(
visitHour %>%
filter(cluster==2) %>%
filter(hour>=19) %>%
mutate(visit=ifelse(visit-residents*(0.25*20/24+0.115*(hour-18))<0,
visit*0.05,
visit-residents*(0.25*20/24+0.115*(hour-18))))
) %>%
rbind(
visitHour %>%
filter(cluster!=2)
) %>%
dplyr::select(-`ratio>2h`, -cluster,-residents)
# rectify 'dwellTime'
dwellTime <- dwellTime %>%
left_join(ratio2h, by=c("placekey","month"))
dwellTime <- dwellTime %>%
filter(cluster==2) %>%
filter(dwellTimes=="121-240" | dwellTimes==">240") %>%
mutate(visitors = visitors*(1-`ratio>2h`)) %>%
rbind(dwellTime %>%
filter(cluster==2) %>%
filter(dwellTimes!="121-240" & dwellTimes!=">240")) %>%
rbind(dwellTime %>%
filter(cluster==1) %>%
filter(dwellTimes=="5-10") %>%
left_join(group1.dwell %>% dplyr::select(placekey,month,`rectified5-10`),by=c("placekey","month")) %>%
mutate(visitors = `rectified5-10`) %>%
dplyr::select(-`rectified5-10`)) %>%
rbind(dwellTime %>%
filter(cluster==1) %>%
filter(dwellTimes!="5-10")) %>%
rbind(dwellTime %>%
filter(cluster==3)) %>%
dplyr::select(-`ratio>2h`, -cluster)
dwellTime <- dwellTime %>%
filter(dwellTimes!='<5')
# rectify 'visitCount'
visitCount <- visitCount %>%
left_join(ratio2h, by=c("placekey","month"))
visitCount <- visitCount %>%
filter(cluster==1) %>%
left_join(group1.dwell,by=c("placekey","month")) %>%
mutate(visits = visits*(1-`rectified5-10rate`-`<5rate`)) %>%
dplyr::select(placekey,visitDay,day,visits,month,geometry) %>%
rbind(visitCount %>%
filter(cluster==2)%>%
left_join(group23.dwell,by=c("placekey","month")) %>%
left_join(ratio2h, by=c("placekey","month")) %>%
mutate(visits = visits*(1-`<5rate`-`ratio>2h.y`)) %>%
dplyr::select(placekey,visitDay,day,visits,month,geometry)) %>%
rbind(visitCount %>%
filter(cluster==3) %>%
left_join(group23.dwell,by=c("placekey","month")) %>%
mutate(visits = visits*(1-`<5rate`)) %>%
dplyr::select(placekey,visitDay,day,visits,month,geometry)) %>%
mutate(visits = round(visits,0))
Through above complicated analysis and rectification, we finally obtain more reliable SafeGraph data. In the following section, we attempt to draw insights from the SafeGraph data in 5 aspects: visit counts in device panel, origins of visits in device panel, departure of visits in device panel, weekday pattern of device panel visits, dwelling time of device panel visits, and related brands of device panel visits at the same day.
sumVisit <- visitCount %>%
dplyr::select(-visitDay,-day,-month) %>%
group_by(placekey) %>%
summarise(visits=sum(visits))
ggplot(sumVisit)+
geom_sf(data=pprServiceArea,color='black',size=0.35,fill= "transparent")+
geom_sf(data=pprServiceArea %>% filter(PPR_DIST %in% c(7,8,9)),color='black',size=0.5,fill= "transparent")+
geom_sf(aes(size = visits),color = palette1_main,fill = palette1_main,alpha = 0.3) +
scale_size_continuous(range = c(1, 3),name = "Visits")+
mapTheme()+
theme(legend.position = "bottom",
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
From the Figure3.2.1 we can see that people frequently visit facilities in the city center where have dense PPR facilities. For those facilities, visit counts are small because citizens are spread into other facilities. However, the large visit counts happen at the outskirt of Philadelphia where have less facilities
sumVisitServiceArea <- pprServiceArea %>%
st_join(sumVisit,left =TRUE) %>%
drop_na(INFO) %>%
dplyr::select(INFO,visits,geometry) %>%
group_by(INFO) %>%
mutate(totalVisits=sum(visits)) %>%
dplyr::select(-visits) %>%
distinct() %>%
st_sf() %>%
st_transform(crs=crs)
ggplot(sumVisitServiceArea)+
geom_sf(aes(fill = q5(totalVisits)),color = 'white',size=0.5) +
scale_fill_manual(values = palette5,labels = qBr(sumVisitServiceArea,'totalVisits'),name = "Total Device Visits") +
mapTheme()+
theme(legend.position = "bottom",
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
If we aggregate the visits into service area level, we can correspond the findings above based on Figure3.2.2 Overall, service areas in the city center where has dense facilities will have larger visit counts. Notably, ‘Junod Area’ service area also has high total device visits which may attribute to the Walton Run Park and the Poquessing Valley Park.
# CensusData <-
# get_acs(geography = "block group",
# variables = c("B01003_001E"),
# year=2019, state="PA", county="Philadelphia", geometry=T, output="wide") %>%
# st_transform(crs=4326) %>%
# dplyr::select(-NAME,-starts_with('B')) %>%
# st_centroid() %>%
# as.data.frame() %>%
# rename("originGeometry" = "geometry")
#
# placekeyGeometry <-
# originCount %>%
# dplyr::select(placekey) %>%
# group_by(placekey) %>% summarise()
#
# orgCountPlot <- originCount %>%
# st_drop_geometry() %>%
# group_by(origin,placekey) %>%
# summarise(visitors=sum(visitors))%>%
# left_join(placekeyGeometry,by=c("placekey" = "placekey")) %>%
# rename("parkGeometry" = "geometry")%>%
# filter(startsWith(origin,"42101"))%>%
# left_join(CensusData,by=c("origin" = "GEOID"))%>%
# mutate(park.y=as.numeric(map(parkGeometry,function(x){return(x[2])})),
# park.x=as.numeric(map(parkGeometry,function(x){return(x[1])})),
# origin.y=as.numeric(map(originGeometry,function(x){return(x[2])})),
# origin.x=as.numeric(map(originGeometry,function(x){return(x[1])})))
#
# st_write(orgCountPlot,"data/output/orgCountPlot.GEOJSON")
orgCountPlot <- st_read("data/output/orgCountPlot.GEOJSON")
orgCountPlot2 <- orgCountPlot%>%
filter(visitors>200)
orgCountPlotHF <- orgCountPlot%>%
filter(visitors>5000)
ggplot(data = orgCountPlot2) +
geom_sf(data = pprServiceArea %>% st_transform(crs=4326),fill ="transparent", color="black",size=0.5)+
geom_sf(data=pprDistrict %>% st_transform(crs=4326),fill ="transparent", color="black",size=1)+
geom_sf(data=propertyWithPlaceKey %>% dplyr::select(OBJECTID) %>%distinct() %>% st_transform(crs=crs), fill ='#C5C5C5', color='#C5C5C5',alpha=0.1,size=0)+
geom_curve(aes(x = origin.x, y = origin.y,xend = park.x,yend = park.y,color = q5(visitors)),
size = 0.5,
curvature = -0.2,
alpha = 0.4,)+
scale_color_manual(values = palette5,
labels = qBr(orgCountPlot2,"visitors"),
name = "Device Visits (Quintile Breaks)") +
labs(x="",y="")+
mapTheme()+
theme(legend.position = "bottom",panel.spacing = unit(6, 'lines'),
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
ggplot(data = orgCountPlotHF) +
geom_sf(data = pprServiceArea %>% st_transform(crs=4326),fill ="transparent", color="black",size=0.5)+
geom_sf(data=pprDistrict %>% st_transform(crs=4326),fill ="transparent", color="black",size=1)+
geom_sf(data=propertyWithPlaceKey %>% dplyr::select(OBJECTID) %>%distinct() %>% st_transform(crs=crs), fill ='#C5C5C5', color='#C5C5C5',alpha=0.3,size=0)+
geom_curve(aes(x = origin.x,y = origin.y,xend = park.x,yend = park.y),
arrow = arrow(length = unit(0.01, "npc"), type="closed"),
size = 0.75,color = palette1_main,curvature = -0.2, alpha = 0.7)+
labs(x="",y="")+
mapTheme()+
theme(legend.position = "bottom",panel.spacing = unit(6, 'lines'),
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
In the Figure3.2.3, we plot the origin of visits in device panel while in the Figure3.2.4, we filter routes have more than 5000 device visits in 2021 and plot it. The map is interesting because people from this census block group (421010177011), which are located at the Kensington Ave are more likely to go to PPR facilities compared to the people in other census block group.
One possible cause maybe that vulnerable groups have a stronger need for public resources. These public resources may make up their unsatisfying living environment. They can have social connections, taken part in the training program and even have free food there.
# depaCountPlot <- departCount %>%
# st_drop_geometry() %>%
# group_by(departure,placekey) %>%
# summarise(visitors=sum(visitors))%>%
# left_join(placekeyGeometry,by=c("placekey" = "placekey")) %>%
# rename("parkGeometry" = "geometry")%>%
# filter(startsWith(departure,"42101"))%>%
# left_join(CensusData,by=c("departure" = "GEOID"))%>%
# mutate(park.y=as.numeric(map(parkGeometry,function(x){return(x[2])})),
# park.x=as.numeric(map(parkGeometry,function(x){return(x[1])})),
# departure.y=as.numeric(map(originGeometry,function(x){return(x[2])})),
# departure.x=as.numeric(map(originGeometry,function(x){return(x[1])})))
#
# st_write(depaCountPlot,"data/output/depaCountPlot.GEOJSON")
depaCountPlot <- st_read("data/output/depaCountPlot.GEOJSON")
depaCountPlot2 <- depaCountPlot%>%
filter(visitors>200)
ggplot(data = depaCountPlot2) +
geom_sf(data = pprServiceArea %>% st_transform(crs=4326),fill ="transparent", color="black",size=0.5)+
geom_sf(data=pprDistrict %>% st_transform(crs=4326),fill ="transparent", color="black",size=1)+
geom_sf(data=propertyWithPlaceKey %>% dplyr::select(OBJECTID) %>%distinct() %>% st_transform(crs=crs), fill ='#C5C5C5', color='#C5C5C5',alpha=0.1,size=0)+
geom_curve(aes(x = departure.x, y = departure.y, xend = park.x, yend = park.y,color = q5(visitors)),
size = 0.5,curvature = -0.2, alpha = 0.4)+
scale_color_manual(values = palette5,
labels = qBr(depaCountPlot2,"visitors"),
name = "Device Visits (Quintile Breaks)") +
labs(x="",y="")+
mapTheme()+
theme(legend.position = "bottom",panel.spacing = unit(6, 'lines'),
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
depaCountPlotMostPopular <- depaCountPlot%>%
st_drop_geometry() %>%
dplyr::select(placekey,visitors) %>%
group_by(placekey) %>%
summarise(totalvisits = sum(visitors)) %>%
filter(totalvisits==max(totalvisits))
depaCountPlotHF <- depaCountPlot%>%
filter(placekey==depaCountPlotMostPopular$placekey[1] & visitors>=1000)
ggplot(data = depaCountPlotHF) +
geom_sf(data = pprServiceArea %>% st_transform(crs=4326),fill ="transparent", color="black",size=0.5)+
geom_sf(data=pprDistrict %>% st_transform(crs=4326),fill ="transparent", color="black",size=1)+
geom_sf(data=propertyWithPlaceKey %>% dplyr::select(OBJECTID) %>%distinct() %>% st_transform(crs=crs), fill ='#C5C5C5', color='#C5C5C5',alpha=0.3,size=0)+
geom_curve(aes(x = departure.x, y = departure.y, xend = park.x, yend = park.y,color = q5(visitors)),
arrow = arrow(length = unit(0.01, "npc"), type="closed"),
size = 1,curvature = -0.2, alpha = 0.7)+
scale_color_manual(values = palette5,
labels = qBr(depaCountPlot2,"visitors"),
name = "Device Visits (Quintile Breaks)") +
labs(x="",y="")+
mapTheme()+
theme(legend.position = "bottom",panel.spacing = unit(6, 'lines'),
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
Figure3.2.5 and Figure3.2.6 are the maps of the departure of visits in device panel. In Figure3.2.6 we can see that the PPR facility has largest visits based on safegraph device panel is the charles playgroud. And people are mainly coming from these three census block groups (421010241001, 421010177011, 421010160005).
VisitWeekdaybyServiceArea <- pprServiceArea %>%
st_join(visitWeekday) %>%
st_drop_geometry() %>%
dplyr::select(-placekey,-month,-PPR_DIST,-Shape_Leng,-Shape_Area)%>%
group_by(NAME,INFO,visitWeekday) %>%
summarise(totalVisit = sum(visits, na.rm = T)) %>%
ungroup()
VisitWeekdaybyServiceArea$visitWeekday <- factor(VisitWeekdaybyServiceArea$visitWeekday, levels= c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday","Sunday"))
VisitWeekdaybyServiceArea%>%
ggplot(aes(visitWeekday,totalVisit)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="Weekday", y="Aggregated Visits in device panel") +
facet_wrap(~NAME, ncol = 8, scales = "fixed")+
plotTheme(5,5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text( size=5),
axis.text = element_text( size=5),
strip.text.x = element_text( size = 5))
In Figure3.2.7, we group the data by weekday to see their weekday distribution pattern. Apart from their own difference in total visit counts, there are three types of distribution. One is that all seven days in a week have similar total visit counts in device panel. Second is that the peak appears in the weekend. And the third is that the peak happens during the week.
VisitHourbyServiceArea <- pprServiceArea %>%
st_join(visitHour) %>%
st_drop_geometry() %>%
dplyr::select(-placekey,-month,-PPR_DIST,-Shape_Leng,-Shape_Area)%>%
group_by(NAME,INFO,hour) %>%
summarise(totalVisit = sum(visit, na.rm = T)) %>%
ungroup()
VisitHourbyServiceArea%>%
ggplot(aes(hour,totalVisit)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="hour", y="Aggregated Visits in device panel") +
facet_wrap(~NAME, ncol = 8, scales = "fixed")+
plotTheme(5,5)+
theme(axis.text = element_text( size=5),strip.text = element_text( size=5),strip.text.x = element_text( size = 5))
After the rectification above, we finally obtain the commonsensible hour pattern. In the Figure3.2.8, we group the data by hour to see their hour distribution pattern. Apart from their own difference in total visit counts, there are two types of distribution. One is that there is no obvious hourly pattern during the day. Another one has obvious peak time happening at the mid-day.
dwellTimeForPlot <- dwellTime %>%
mutate(dwellTimes = recode(dwellTimes,
"<5" = 2.5,
"5-10" = 7.5,
"11-20" = 15,
"21-60" = 40,
"61-120" = 90,
"121-240" = 180,
">240" = 240)) %>%
mutate(sepTotalDwellTime = (visitors*dwellTimes)) %>%
group_by(placekey) %>%
mutate(totalVisitors=sum(visitors) )%>%
filter(totalVisitors>50) %>%
mutate(avgDwellTime=sum(sepTotalDwellTime)/totalVisitors) %>%
dplyr::select(placekey,avgDwellTime) %>%
distinct()
ggplot(dwellTimeForPlot)+
geom_sf(data=pprServiceArea,color='black',size=0.25,fill= "transparent")+
geom_sf(data=pprDistrict,color='black',size=0.75,fill='transparent')+
geom_sf(aes(size = avgDwellTime,color = avgDwellTime),alpha = 0.5) +
scale_size_continuous(range = c(0,2),name = "avgDwellTime")+
scale_color_continuous(low = '#FFDEDB',high ='#FF2903',
name = "avgDwellTime") +
mapTheme()+
theme(legend.position = "bottom",
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
In Figure3.2.9, we calculate the average dwelling time for every facility. From the map, we can find out that the longest dwelling time happes at the placekey(223-222@628-pm9-wkz) Independent Square Region.
dwellTImePlotServiceArea <- pprServiceArea %>%
st_join(dwellTimeForPlot,left =TRUE) %>%
drop_na(INFO) %>%
dplyr::select(INFO,avgDwellTime,geometry) %>%
group_by(INFO) %>%
mutate(avgDwellTime=mean(avgDwellTime)) %>%
distinct() %>%
st_sf() %>%
st_transform(crs=crs)
ggplot(dwellTImePlotServiceArea)+
geom_sf(aes(fill = q5(avgDwellTime)),color = 'white',size=0.5) +
scale_fill_manual(values = palette5,labels = qBr(dwellTImePlotServiceArea,'avgDwellTime'),name = "Average Device Dwelling Time") +
mapTheme()+
theme(legend.position = "bottom",
legend.key.width = unit(0.5, 'cm'),
legend.key.height = unit(0.2, 'cm'))
If we aggregate them into the service area level, individual facility’s characteristics are erased, we can see that long dwelling time happens are more likely happening at the residential regions, which have more higher population densities. And facilities in this area are more likely visited by chance.
To think over how might we help to program and allocate staff differently, we divided all locations into 3 categories based on their visit data and activity applications (permits).
Supportive facilities, pink points, have huge amount of visitors who will stay longer than normal, but only a few activity permits here, which means they have great potential, PPR may set more programs in the future.
Normal facilities, orange points, are normal positions with appropriate foot traffic and activity numbers.
Potential facilities, purple points, have many group activity but few foot traffic, since they are usually function center who support the community events.
# calculate mean dwell time
dwellMean <- dwellTime %>%
st_drop_geometry() %>%
mutate(dwell = case_when(dwellTimes=="5-10" ~ 7.5,
dwellTimes=="11-20" ~ 15.5,
dwellTimes=="21-60" ~ 40.5,
dwellTimes=="61-120" ~ 90.5,
dwellTimes=="121-240" ~ 180.5,
dwellTimes==">240" ~ 270),
dwellMuti = dwell*visitors) %>%
group_by(placekey, month) %>%
summarize(dwellMuti = sum(dwellMuti),
visitors = sum(visitors)) %>%
mutate(meanDwell = dwellMuti/visitors)
# integrate dwell, total counts and permit
sgIntegrated_byMonth <- full_join(dwellMean %>% dplyr::select(placekey,month,meanDwell),
visitCount %>%
st_drop_geometry() %>%
group_by(placekey, month) %>%
summarize(visits = sum(visits)),
by=c('placekey','month')) %>%
na.omit() %>%
ungroup() %>%
left_join(propertyWithPlaceKey %>% dplyr::select(placekey, OBJECTID)) %>%
left_join(permit2021.join %>%
group_by(PPR_Properties_ObjectID) %>%
summarize(permits = n()),
by=c("OBJECTID"="PPR_Properties_ObjectID")) %>%
na.omit() %>%
dplyr::select(-OBJECTID) %>%
group_by(placekey, month) %>%
summarize(permits = sum(permits),
visits = mean(visits),
meanDwell = mean(meanDwell)) %>%
ungroup() %>%
left_join(parks %>% dplyr::select(placekey), by="placekey") %>%
st_sf()
# normalize
sgIntegrated_byMonth.scale <- scale(sgIntegrated_byMonth %>%
dplyr::select(-placekey,-month) %>%
st_drop_geometry())
set.seed(1234)
# decide cluster number (only run once)
# nc <- NbClust(sgIntegrated_byMonth.scale, min.nc=2, max.nc=15, method="kmeans", index="all")
# table(nc$Best.n[1,])
#
# barplot(table(nc$Best.n[1,]),
# xlab="Numer of Clusters", ylab="Number of Criteria",
# main="Number of Clusters Chosen by 26 Criteria")
# Run K-Means cluster
cluster2 <- kmeans(sgIntegrated_byMonth.scale, 3)
# summary(cluster2)
# add cluster number back
sgIntegrated_byMonth <- sgIntegrated_byMonth %>%
mutate(cluster = cluster2$cluster)
# mean by cluster
cluster2_mean <- aggregate(sgIntegrated_byMonth %>%
dplyr::select(-placekey, -cluster,-month) %>%
st_drop_geometry(),
by=list(cluster=sgIntegrated_byMonth$cluster),
FUN=mean) %>%
left_join(sgIntegrated_byMonth %>%
st_drop_geometry() %>%
group_by(cluster) %>%
summarize(size = n()),
by="cluster")
kable(cluster2_mean,align = 'c',caption = '<center>Table 3. Mean values of clusters for conflicts <center/>') %>%
kable_classic(full_width = F)%>%
kable_styling(position = "center")%>%
scroll_box(width = "100%", height = "400px")
## put histograms on the diagonal
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, ...)
}
#Color points by groups
my_cols <- c(palette1_main,palette5[1],palette5[5])
pairs(sgIntegrated_byMonth %>%
st_drop_geometry() %>%
dplyr::select(meanDwell,visits,permits),
pch = 19, cex = 0.5, cex.labels=1, diag.panel = panel.hist,
size=0.5,
col = my_cols[sgIntegrated_byMonth$cluster],
lower.panel=NULL, panel = panel.smooth)
Here are how these three types of facilities are spatially distributed in District 7,8,9. There are many supportive facilities in south Philadelphia.
ggplot() +
geom_sf(data=sgIntegrated_byMonth,aes(color=as.factor(cluster)))+
geom_sf(data=pprDistrict,color="black",size=0.7,linetype ="dashed",fill = "transparent")+
geom_sf(data=pprDistrict %>% filter(DISTRICTID %in% c(7,8,9)),color="black",size=1,fill = "transparent")+
labs(title = "",
subtitle = "",
x="",y="")+
scale_color_manual(values = my_cols,name = "Facility Clusters", label=c("Supportive (few visits many permits)","Nromal","Potetial (many visits few permits)")) +
mapTheme()+
theme(legend.position = "bottom")
cluster2 : Comparatively normal positions.
cluster 1 : Normal performance on sg data (dwell and visits), but with many permit plans
cluster3: Perform well in sg data (dwell and visits), but not more permit plans
Thomas B. Smith Recreation Center, located in District 9, is a representative facility with certain number of permits, but as low as 100-300 times of visits per month and short average dwell time around one hour. There are several sport courts, the Smith playground, and a spray park surrounding the B. Smith Recreation Center. According to scheduled program, more activities of athletic, education, and after school take place here.
Much feedback from the Google review represent it is a great place for the community residents to use, but a three-year old kid gets bored in 10 minutes due to the limited equipment in playground. More equipment, renovations and better management and maintenance are needed to improve this facility.
Smith <- program2021.join %>%
filter(FacilityID == "{13600B8B-AB62-4A39-95CF-5F31C6942010}")
SmithSum <- Smith %>%
mutate(totalCount = as.numeric(RegisteredIndividualsCount))%>%
dplyr::select(totalCount,ActivityType, AttendanceWeekDate) %>%
drop_na() %>%
group_by(ActivityType, AttendanceWeekDate)%>%
summarise(totalCount=mean(totalCount)) %>%
ungroup() %>%
group_by(ActivityType)%>%
summarise(totalCount=sum(totalCount))
SmithSum %>%
ggplot(aes(ActivityType, totalCount, fill=ActivityType)) +
geom_bar(position ="dodge",stat="identity",fill=palette1_main) +
labs(y = "Visitor Count", fill=palette1_main, title = " ")+
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
Thomas B. Smith Recreation Center is a case of these supportive center with low demand but high supply. Record from the PPR shows various types of activities are hold in this place. The multi-purpose interior space services for after school, cultural, and educational activities, with a computer lab, a kitchen and several tables and seats. There are several sport courts (tennis courts, basketball courts and a football field) surrounded to hold athletic activities.
sumVisit <- dwellTime %>%
filter(placekey=='zzw-222@628-pm7-rtv') %>%
dplyr::select(-month) %>%
group_by(dwellTimes) %>%
summarise(visitors=sum(visitors)) %>%
st_drop_geometry()
sumVisit$dwellTimes <- factor(sumVisit$dwellTimes, levels= c("<5","5-10","11-20","21-60","61-120","121-240",">240" ))
plot1 <- sumVisit%>%
ggplot(aes(dwellTimes,visitors)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="Dwell Time", y="Aggregated Visitors",
title ='') +
plotTheme()
sumVisit <- visitHour %>%
filter(placekey=='zzw-222@628-pm7-rtv') %>%
dplyr::select(-month) %>%
group_by(hour) %>%
summarise(visits=sum(visit)) %>%
st_drop_geometry()
plot2 <- sumVisit%>%
ggplot(aes(hour,visits)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="hour", y="Aggregated Visits",
title ='') +
plotTheme()
grid.arrange(plot1, plot2,ncol=2,top = "")
visitCount %>%
filter(placekey=='zzw-222@628-pm7-rtv') %>%
st_drop_geometry()%>%
na.omit()%>%
ggplot(aes(visitDay,visits)) +
geom_line(color=palette1_main,size=0.75)+
labs(title = '',x="Visit Date",y="Safegraph Visit")+
plotTheme()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=1))
According to the time trend of visits. all the visit peak here respond to some big sport events, like basketball events with group size of 50. Most visits come and go from morning to evening of a day, but average dwell time is less than an hour. It presents more people stop by this recreation center but leave quickly, which means the facilities or activities are not attractive to keep them stay. So, the PPR should consider to improve the equipment and enrich the functions of this place to incorporate activities schedule.
Girard Estates Park is a representative case of non-normal facility with low permits, high visits and long dwell time. According to permit records, yoga is the only activity applied for permit 13:00-14:00 every Sunday from April to September. No other programs are scheduled by the PPR. However, it is a super active facility and important open space for local community to use. More Information
The Girard Estate Neighbors Association, a volunteer neighborhood organization, engage residents and plan social, recreational, and educational events. They meet once a month and as needed for each event.
visitCount %>%
filter(placekey=='zzz-222@628-pm9-jn5') %>%
st_drop_geometry()%>%
na.omit()%>%
ggplot(aes(visitDay,visits)) +
geom_line(color=palette1_main,size=0.75)+
labs(title = '',x="Visit Date",y="Safegraph Visit")+
plotTheme(10,20)+
theme(panel.border = element_rect(colour = "black", fill=NA, size=1))
The three peaks of visits on the figure above may respond to events for celebrating major festivals, like Easter Egg Hunt, Halloween and Christmas carriage rides. The spring, summer and fall children concerts also take place in May, June and September. Compared to limited PPR program/ permit records, Safegraph data can reflect better using conditions of high visits in the park.
sumVisit <- visitHour %>%
filter(placekey=='zzz-222@63s-dvq-5fz') %>%
dplyr::select(-month) %>%
group_by(hour) %>%
summarise(visits=sum(visit)) %>%
st_drop_geometry()
plot1 <- ggplot(data=sumVisit, aes(hour,visits)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="hour", y="Aggregated Visits",
title ='') +
plotTheme()
sumVisit3 <- dwellTime %>%
filter(placekey=='zzz-222@628-pm9-jn5') %>%
dplyr::select(-month) %>%
group_by(dwellTimes) %>%
summarise(visitors=sum(visitors)) %>%
st_drop_geometry()
sumVisit3$dwellTimes <- factor(sumVisit3$dwellTimes, levels= c("<5","5-10","11-20","21-60","61-120","121-240",">240" ))
plot2 <- ggplot(data=sumVisit3, aes(dwellTimes,visitors)) +
geom_bar(position ="dodge",fill = palette1_main,stat='identity') +
labs(x="Dwell Time", y="Aggregated Visitors",
title ='') +
plotTheme()
grid.arrange(plot1,plot2,ncol=2,top="")
According to the figures above, most activities, like live concert, educational class take place during the day with long dwell time. It seems 11 am to 3 pm is the most time period with the most aggregated visits.
Our goal for this project is to help the PPR better understand the usage of their facilities and provide new recommendations about how staff members should be allocated within their service areas and what programs they should hold.
The eventually userinterface will be the Dashboard. That will help to convey the PPR related information, useful exploratory analysis, and the outcome of the huff model in the end. This dashboard will help PPR better understand the situation of their programs and permits, each facility, service area, and district. Doing so to help PPR make better decision in the future programming.
In this stage, we have successfully understand the data from PPR and SafeGraph. And we also try to draw exploratory analysis by PPR service area level based on SafeGraph pattern data. For the outcome, we have gained many interesting insights shown above. These insights will not be obtianed without SafeGraph pattern data. Besides, we have the outline of the dashboard, which in the end will help us communicate the information of this Practicum with PPR and help them do better programming.
In the next stage of the Practicum, we will focus on the modeling with Huff model in District 7,8,9, and try to define and construct the attractiveness metric of facility. This attractiveness metric will be used in the huff model to obtain the market area for each facility. Finally we will compare the market area with SafeGraph pattern area to see whether PPR current programmings have achieve their goals, serving the target residents.