Course Overview

SDS 192

Schwab

Introduction to Programming Language for Data Science (R) and Computing Environment (RStudio)

Data Science Life Cycle

DS life cycle

Grammar of Graphics

Hadley Wickham

Data Collection

Collect student data here

Web Scraping and Importing Data ( and a tiny bit of cleaning)

library(googlesheets4)
library(janitor)

student_data <- read_sheet('https://docs.google.com/spreadsheets/d/1XOAFQvBcFvhKwEl5xH6l3nRJK2Wk0hWZ18S9kAZclQ0/edit?usp=sharing')

student_data <- clean_names(student_data)

Data Visualization

# make simple plot of student data. 
library(tidyverse)

student_data |>
  ggplot(mapping = aes(x = height) )+
  geom_histogram()+
  labs(
    title = "Students' height in inches"
  )

Data Wrangling

# wrangle student data. edit plot
student_data |>
  
  select(briefly_describe_any_previous_experience_you_have_with_r) |>
  
  rename(r_experiance = briefly_describe_any_previous_experience_you_have_with_r )|>
  
  group_by(r_experiance)|>
  
  summarize(
    
    sum()/n()
    
  )
# A tibble: 2 × 2
  r_experiance   `sum()/n()`
  <chr>                <dbl>
1 More than none           0
2 None                     0

Data Management

-   Security
-   Keeping yourself organized!
-   How Data is housed

Data Ethics

How we use data without doing harm.

Iteration

# have r do many repeat steps for us. 

print(student_data$one_word_summer_highlight)
[1] "Beach"        "Concerts"     "vacation"     "Europe"       "Books"       
[6] "videogames"   "Pinebarrens"  "hot"          "Study Abroad"

Verson Control

With git and github.

Introduction to Database Querying with SQL

Wrangle some data with the classic SQL.

Simple Query?

Further Topics based on time and student interest may include:

-   Dynamic and Customized Data Graphics

-   Geospatial Data and Maps

-   Learning with AI