library(tidyverse)
library(RMySQL)
<- dbConnect(
db MySQL(),
host = "scidb.smith.edu",
user = "sds192",
password = "DSismfc@S",
dbname = "imdb"
)::opts_chunk$set(connection = db, max.print = 20) knitr
Project 3
Learning Goal
- To write database queries in SQL
Readings
Mini-Project
You will craft a data-driven article of roughly 100–200 words. Your deliverable will be a readable blog post written in R Markdown. The final submission should be production-quality, replete with hyperlinks, images, data tables and/or graphics, and correct spelling and grammar.
You project must use SQL to query data from scidb
.
The argument you make in your article must be supported by data stored on scidb
. You may do some of your data wrangling work in R, but you should use SQL first to retrieve data as efficiently as possible. Your article must include at least one data table or data graphic.
Your piece should:
- Employ language that is readable and clear without jargon, and with any technical terms either rephrased or explained.
- Keep paragraphs short.
- Include a counterargument or other/opposite perspective, often about 2/3rds of the way through.
- Have a strong headline or title that will pull people in.
- Include hyperlinks, if applicable, to any relevant recent pieces on the web.
Your project will use data stored on scidb
in at least one of the seven databases listed below. You will write one or more queries in SQL that will pull in data relevant to your question. Examination of that data will inform your response to the question, and you will then formulate your arguments accordingly. Recall that (as always) communication is a critical component of data science, so details like axis labels, figure captions, spelling, and grammar, are just as important as writing your queries correctly and making a logical argument.
Warning
Be extra careful when writing your queries! Just because the query executes without an error does not mean that it will return exactly what you want. The computer is dumb—it just carries out instructions. You are smart—it’s your job to translate your ideas into a syntax that the computer can understand. Know your data!!!
Data
Data are available on scidb.smith.edu
.
SHOW DATABASES;
Database |
---|
airlines |
citibike |
fec |
imdb |
information_schema |
lahman |
yelp |
Databases
Your project should focus on one of the following databases:
airlines
: on-time flight data from the Bureau of Transportation Statistics (raw data documentation)citibike
: trip-level data from New York City’s municipal bike rental system (raw data documentation)fec
: campaign finance data from the Federal Election Commission (raw data documentation)imdb
: a copy of the Internet Movie Database (raw data documentation)lahman
: historical season-level baseball statistics (raw data documentation)nyctaxi
: ride-level data from New York City’s Taxi & Limousine Commission (raw data documentation)yelp
: restaurant reviews in the Phoenix, AZ area (raw data documentation)
Scope r
Due to the compressed time during the Interterm, you may limit the scope of your analysis to results that can be achieved via a single query. In many cases, a simple table of results will be sufficient—you should not feel obligated to create a fancy data graphic.
Submission
- Push all commits to the appropriate repository in the MTH 190 GitHub organization.
- Submit rendered
.html
file to Moodle