Putting Visual Analytics into Practical Use
The task of Take-home Exercise 3 is to reveal the economic of the
city of Engagement, Ohio USA by using appropriate static and interactive
statistical graphics methods.
There are three questions will be answered in this blog.
The data file used for this exercise is FinancialJournal.csv. This table contains information about financial transactions. Following are the definitions of 4 columns of data:
For this exercise, I used three libraries. They are tidyverse, lubridate and patchwork. The R code in the following code chunk is used to install the required packages and load them into RStudio environment.
packages <- c('tidyverse', 'lubridate', 'patchwork')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Data import was completed by using read_csv() which is a function in readr package. This function is useful for reading delimited files into a tibble. Here, head() function is used to display the first 6 rows of data in the dataset.
raw_data <- read_csv("data/FinancialJournal.csv")
head(raw_data)
# A tibble: 6 x 4
participantId timestamp category amount
<dbl> <dttm> <chr> <dbl>
1 0 2022-03-01 00:00:00 Wage 2473.
2 0 2022-03-01 00:00:00 Shelter -555.
3 0 2022-03-01 00:00:00 Education -38.0
4 1 2022-03-01 00:00:00 Wage 2047.
5 1 2022-03-01 00:00:00 Shelter -555.
6 1 2022-03-01 00:00:00 Education -38.0
When I used filter function in Excel to observe the data, I found that there are some identical data rows in the table. Therefore, I need to remove these duplicate data before visualizing the data. raw_data %>% distinct() in the following R code chunk is used to complete this step.
[1] 1113
We can see that there are 1113 duplicate rows in the raw dataset.
Through analyzing the data in the Participants.csv
table, I have already found that the number of participants in this
study is fairly evenly distributed in five aspects: age, family size,
whether there are children, education level, and interests.
Therefore, the trend in participants’ average monthly wages is a good indicator of financial health in Ohio state.
num_participant <- max(data$participantId) - min(data$participantId) + 1
Month <- c()
Avg_wage <- c()
avg_food_list <- c()
Education_Expense <- c()
Recreation_Expense <- c()
Rental_Income <- c()
Rent_Cost <- c()
for (y in min(year(data$timestamp)):max(year(data$timestamp))){
tmp <- data[year(data$timestamp) == y,]
tmp_month <- month(tmp$timestamp)
for (m in min(tmp_month):max(tmp_month)){
y <- as.character(y)
m <- as.character(m)
Month <- append(Month, paste(y, m, sep = "-"))
tmp_data <- tmp[month(tmp$timestamp) == m,]
tmp_wage <- tmp_data[tmp_data$category == "Wage",]
tmp_food <- tmp_data[tmp_data$category == "Food",]
tmp_edu <- tmp_data[tmp_data$category == "Education",]
tmp_rec <- tmp_data[tmp_data$category == "Recreation",]
tmp_rent <- tmp_data[tmp_data$category == "RentAdjustment",]
tmp_shel <- tmp_data[tmp_data$category == "Shelter",]
Avg_wage <- append(Avg_wage, sum(tmp_wage$amount) / num_participant)
avg_food_list <- append(avg_food_list, sum(tmp_food$amount) / num_participant)
Education_Expense <- append(Education_Expense, sum(tmp_edu$amount) / num_participant)
Recreation_Expense <- append(Recreation_Expense, sum(tmp_rec$amount) / num_participant)
Rental_Income <- append(Rental_Income, sum(tmp_rent$amount) / num_participant)
Rent_Cost <- append(Rent_Cost, sum(tmp_shel$amount) / num_participant)
}
}
Engel_Coefficient <- avg_food_list / (avg_food_list + Education_Expense + Recreation_Expense + Rent_Cost)
month_fee_table <- data.frame(Month, Avg_wage, Engel_Coefficient, Education_Expense, Recreation_Expense, Rental_Income, Rent_Cost)
month_fee_table$Month<- factor(month_fee_table$Month,
levels = Month, ordered = TRUE)
head(month_fee_table)
Month Avg_wage Engel_Coefficient Education_Expense
1 2022-3 6333.056 0.2019471 -14.19767
2 2022-4 3431.015 0.2404876 -11.29915
3 2022-5 3583.647 0.2570449 -11.29915
4 2022-6 3569.617 0.2550617 -11.29915
5 2022-7 3447.872 0.2586480 -11.29915
6 2022-8 3722.856 0.2629630 -11.29915
Recreation_Expense Rental_Income Rent_Cost
1 -642.5230 52.921353 -624.7509
2 -385.4549 1.413694 -553.8265
3 -332.7578 0.000000 -552.3746
4 -311.3831 0.000000 -552.3746
5 -326.0269 0.000000 -552.3746
6 -303.6227 0.000000 -552.3746
Through observing the following bar chart, it is obvious that the per
capita wage in March 2022 is nearly twice as large as the per capita
wages in April 2022. I think the reason may be the outbreak of Omicron
in Ohio in March 2022. The pandemic created a huge impact on economy of
Ohio state, which led to a sharp drop in per capita income.
In 2023, although per capita wages will rebound in March, the overall trend will be still downward. Thus, I can initially infer that the Ohio residents’ financial health is not very good.
p1 <- ggplot(data = month_fee_table, aes(x = Month, y = Avg_wage)) +
geom_col(width = 0.5, color = "skyblue", fill = "skyblue") +
ylim(0, 7000) +
coord_flip() +
scale_x_discrete(limits=rev(Month)) +
ggtitle("Monthly average wages")
p1
According to the Engel coefficient standard designated by the United Nations, Ohio is still a wealthy area, but increasing 5 percentage in the Engel coefficient has already proved that the financial health in Ohio state is very unsatisfactory.
Generally, people’s income is inversely proportional to the Engel coefficient. However, the participants’ predicted Engel coefficient will show a downward trend in March, 2023. This trend will keep same as the monthly average salary, which indicates that the US government may have implement some economic regulations or other policies to prevente further economic deterioration.
p2 <- ggplot(data = month_fee_table, aes(x = Month, y = Engel_Coefficient)) +
geom_point(colour=alpha('blue', 1), size=5) +
geom_line(aes(group=1), colour=alpha('blue', 1), size=2) +
ylim(0.1, 0.3) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10, face = "bold")) +
ggtitle("Engel_Coefficient")
p2
Through observing the following two charts, we can see that both average rental income and average rental cost fell down in April 2022. And both of them will not have obvious changing trend in the subsequent months. This is different from my guess that participants who have vacant homes may replenish their income by raising rents, which may cause renters to pay more money for the rent bills.
I guess the decline in rents may also be related to some economic policies published by the Ohio government. Obviously, such a situation is very beneficial to Ohio’s economic health, because it avoids to widen the gap between the rich and the poor. Moreover, reducing people’s housing pressure can also guarantee social stability.
p3 <- ggplot(data = month_fee_table, aes(x = Month, y = Rental_Income)) +
geom_col(width = 0.5, color = "skyblue", fill = "skyblue") +
ylim(0, 60) +
coord_flip() +
scale_x_discrete(limits=rev(Month)) +
ggtitle("Rental_Income")
p4 <- ggplot(data = month_fee_table, aes(x = Month, y = abs(Rent_Cost))) +
geom_col(width = 0.5, color = "pink", fill = "pink") +
ylab("Rent_Cost") +
ylim(0, 650) +
coord_flip() +
scale_x_discrete(limits=rev(Month)) +
ggtitle("Rent_Cost")
p3 | p4
By observing the following chart, it can be seen that the participants’ recreation expenses displays a downward trend over time. Even if we only look at the two data in April 2022 and May 2023, the difference is still close to 200 dollars.
Therefore, from the phenomenon that participants begin to reduce recreation cost, it is not difficult to find that the overall financial health in Ohio state has indeed got a certain impact.
p5 <- ggplot(data = month_fee_table, aes(x = Month, y = abs(Recreation_Expense))) +
geom_point(colour=alpha('blue', 1), size=5) +
geom_line(aes(group=1), colour=alpha('blue', 1), size=2) +
ylab("Recreation_Expense") +
ylim(200, 700) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10, face = "bold")) +
ggtitle("Recreation_Expense")
p5
By analyzing the data in the FinancialJournal.csv
file,
we can draw the following three conclusions: