Take-home Exercise 4

Visualising and Analysing Daily Routines

Author

Affiliation

Published

May 21, 2022

DOI

1 Introduction

Task of Take-home Exercise 4 is to reveal the daily routines of two selected participants of the city of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods.

In this exercise, I am going to use ViSIElse package to display the daily routines of two selected participants.

2 Data Discription

The data file used for this exercise is ParticipantStatusLogs1.csv. This table contains information about each participant’s daily routine. Following are the definitions of each column of data:

timestamp (datetime): the time when the status was logged
currentLocation (point): the location of the participant within the city at the time the status was logged
participantId (integer): unique ID assigned to each participant
currentMode (string): a string representing the mode the participant is in at the time the status was logged, one of {“AtHome”, “Transport”, “AtRecreation”, “AtRestaurant”, “AtWork”}.
hungerStatus (string): a string representing the participant’s hunger status at the time the status was logged
sleepStatus (string): a string representing the participant’s sleep status at the time the status was logged
apartmentId (integer): the integer ID corresponding to the apartment in which the participant resides at the time the status was logged
availableBalance (float): the balance in the participant’s financial account (negative if in debt)
jobId (integer): the integer ID corresponding to the job the participant holds at the time the status was logged, N/A if unemployed
financialStatus (string): a string representing the participant’s sleep status at the time the status was logged
dailyFoodBudget (double): the amount of money the participant has budgeted for food that day
weeklyExtraBudget (double): the amount of money the participant has budgeted for miscellaneous expenses that week

3 Data Preparation

3.1 Installing and launching R packages

For this exercise, I used 8 libraries. They are sf, tmap, ViSiElse, lubridate, clock, sftime, rmarkdown, tidyverse and qdapTools. The R code in the following code chunk is used to install the required packages and load them into RStudio environment.

packages <- c('sf', 'tmap', 'ViSiElse', 
              "lubridate", "clock", "sftime", 
              "rmarkdown", "tidyverse", "qdapTools")

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3.2 Importing the dataset

Data import was completed by using read_sf() which is a function in sf package. This function is useful for reading delimited files into a tibble.

raw_data <- read_sf("data/ParticipantStatusLogs1.csv", options="GEOM_POSSIBLE_NAMES=location")

Since the size of the original csv file exceeds 200MB, which means this file cannot be committed to the GitHub repository, I integrate the data and save it as rds format to reduce the size of the source data file. In addition, transferring data into rds format won’t lead to the data missing, and reducing the data file size can read the data faster.

The R code in the following code chunk is used to save source data as rds file.

logs_data <- raw_data %>%
  mutate(Timestamp = date_time_parse(timestamp,
                                     zone = "",
                                     format = "%Y-%m-%dT%H:%M:%S")) %>%
  mutate(day = get_day(Timestamp)) %>%
  filter(day(timestamp) == 2)
write_rds(logs_data, "data/rds/ParticipantStatusLogs1.rds")

After transferring the data file into rds format, next time, I can directly use read_rds to read the data in the rds format file.

# A tibble: 291,168 x 14
   timestamp    currentLocation participantId currentMode hungerStatus
   <chr>        <chr>           <chr>         <chr>       <chr>       
 1 2022-03-02T~ POINT (-2724.6~ 0             AtHome      BecomingHun~
 2 2022-03-02T~ POINT (-2619.0~ 1             Transport   JustAte     
 3 2022-03-02T~ POINT (-1360.9~ 2             AtHome      Hungry      
 4 2022-03-02T~ POINT (-1558.5~ 3             AtHome      BecomingHun~
 5 2022-03-02T~ POINT (976.240~ 4             AtHome      BecomingHun~
 6 2022-03-02T~ POINT (-927.27~ 5             Transport   BecameFull  
 7 2022-03-02T~ POINT (1795.12~ 6             AtHome      BecameFull  
 8 2022-03-02T~ POINT (-93.629~ 7             Transport   Hungry      
 9 2022-03-02T~ POINT (616.295~ 8             AtHome      Starving    
10 2022-03-02T~ POINT (-2034.6~ 9             AtHome      BecomingHun~
# ... with 291,158 more rows, and 9 more variables:
#   sleepStatus <chr>, apartmentId <chr>, availableBalance <chr>,
#   jobId <chr>, financialStatus <chr>, dailyFoodBudget <chr>,
#   weeklyExtraBudget <chr>, Timestamp <dttm>, day <int>

3.3 Data Preprocessing

This exercise only selects participants No. 1 and No. 2 to display their daily life from 00:00:00 to 23:59:59 on March 2, 2022, so their data needs to be selected first.

After data preprocessing, the mode data for participant 1 are shown as followed.

  id start_home_list end_home_list start_recr_list end_recr_list
1  1              15           455            1085          1130
2  1               0             0               0             0
3  1               0             0               0             0
4  1               0             0               0             0
5  1               0             0               0             0
6  1               0             0               0             0
  start_rest_list end_rest_list start_tran_list end_tran_list
1             630           650               0            15
2               0             0             455           490
3               0             0             620           630
4               0             0             650           660
5               0             0            1010          1045
6               0             0            1050          1085
  start_work_list end_work_list
1             490           620
2               0             0
3               0             0
4               0             0
5               0             0
6               0             0

After data preprocessing, the mode data for participant 2 are shown as followed.

  id start_home_list end_home_list start_recr_list end_recr_list
1  2               0           360            1080          1195
2  2               0             0               0             0
3  2             980           985               0             0
4  2            1290          1440               0             0
  start_rest_list end_rest_list start_tran_list end_tran_list
1               0             0             360           430
2               0             0             910           980
3               0             0             985          1080
4               0             0            1195          1290
  start_work_list end_work_list
1             430           910
2               0             0
3               0             0
4               0             0

4 Plotting daily routines

The following two charts show the start and end times of each condition in the routine of Participant 1 and Participant 2 for the whole day of March 2, respectively.

ViSiElse differentiate two type of actions, namely: punctual and long.

Since it is difficult to accurately grasp the daily routines of the participants by only observing the start moment and end moment of each state, all the punctual actions are integrated into long actions. This can help us to observe the duration of each state, and it is also convenient to compare the two participants’ daily routines.

book1[11,] <- c("At home", "At home", "l", 11, "start_home_list", "end_home_list")
book1[12,] <- c("Enjoyment", "Enjoyment", "l", 12, "start_recr_list", "end_recr_list")
book1[13,] <- c("At Restaurant", "At Restaurant", "l", 13, "start_rest_list", "end_rest_list")
book1[14,] <- c("Transportation", "Transportation", "l", 14, "start_tran_list", "end_tran_list")
book1[15,] <- c("At work", "At work", "l", 15, "start_work_list", "end_work_list")
book1$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11, 12, 13, 14, 15)
book1 <- book1[order(as.numeric(book1$showorder)), ]
book1

              vars           label typeA showorder             deb
11         At home         At home     l        11 start_home_list
12       Enjoyment       Enjoyment     l        12 start_recr_list
13   At Restaurant   At Restaurant     l        13 start_rest_list
14  Transportation  Transportation     l        14 start_tran_list
15         At work         At work     l        15 start_work_list
1  start_home_list start_home_list     p        NA            <NA>
3    end_home_list   end_home_list     p        NA            <NA>
4  start_recr_list start_recr_list     p        NA            <NA>
5    end_recr_list   end_recr_list     p        NA            <NA>
6  start_rest_list start_rest_list     p        NA            <NA>
7    end_rest_list   end_rest_list     p        NA            <NA>
8  start_tran_list start_tran_list     p        NA            <NA>
9    end_tran_list   end_tran_list     p        NA            <NA>
10 start_work_list start_work_list     p        NA            <NA>
2    end_work_list   end_work_list     p        NA            <NA>
             fin
11 end_home_list
12 end_recr_list
13 end_rest_list
14 end_tran_list
15 end_work_list
1           <NA>
3           <NA>
4           <NA>
5           <NA>
6           <NA>
7           <NA>
8           <NA>
9           <NA>
10          <NA>
2           <NA>

book2[11,] <- c("At home", "At home", "l", 11, "start_home_list", "end_home_list")
book2[12,] <- c("Enjoyment", "Enjoyment", "l", 12, "start_recr_list", "end_recr_list")
book2[13,] <- c("At Restaurant", "At Restaurant", "l", 13, "start_rest_list", "end_rest_list")
book2[14,] <- c("Transportation", "Transportation", "l", 14, "start_tran_list", "end_tran_list")
book2[15,] <- c("At work", "At work", "l", 15, "start_work_list", "end_work_list")
book2$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11, 12, 13, 14, 15)
book2 <- book2[order(as.numeric(book2$showorder)), ]
book2

              vars           label typeA showorder             deb
11         At home         At home     l        11 start_home_list
12       Enjoyment       Enjoyment     l        12 start_recr_list
13   At Restaurant   At Restaurant     l        13 start_rest_list
14  Transportation  Transportation     l        14 start_tran_list
15         At work         At work     l        15 start_work_list
1  start_home_list start_home_list     p        NA            <NA>
3    end_home_list   end_home_list     p        NA            <NA>
4  start_recr_list start_recr_list     p        NA            <NA>
5    end_recr_list   end_recr_list     p        NA            <NA>
6  start_rest_list start_rest_list     p        NA            <NA>
7    end_rest_list   end_rest_list     p        NA            <NA>
8  start_tran_list start_tran_list     p        NA            <NA>
9    end_tran_list   end_tran_list     p        NA            <NA>
10 start_work_list start_work_list     p        NA            <NA>
2    end_work_list   end_work_list     p        NA            <NA>
             fin
11 end_home_list
12 end_recr_list
13 end_rest_list
14 end_tran_list
15 end_work_list
1           <NA>
3           <NA>
4           <NA>
5           <NA>
6           <NA>
7           <NA>
8           <NA>
9           <NA>
10          <NA>
2           <NA>

The following two charts show the modes distribution of participant 1 and participant 2’s long actions throughout the whole day on March 2, respectively.

visi1 <- visielse(X = X1,  book = book1, informer = NULL)

visi2 <- visielse(X = X2,  book = book2, informer = NULL)

For a more intuitive comparison, I combined the journeys of participant 1 and participant 2 on March 2 into one graph.

X <- rbind(X1, X2)
group <- c( "group1", "group1", "group1", "group1", "group1", 
            "group1", "group1", "group1", "group1", "group1", 
            "group1", "group1", "group2", "group2", "group2", 
            "group2")

visi <- visielse(X, group=group, book=book1 ,informer = NULL, method = "cut")

5 Summary

Through comparative observation, we can conclude that on March 2, participant 2’s schedule was more compact than participant 1.