# Assignments

• Project

## Final Portfolio

In lieu of a written final exam, you will construct a course portfolio of three RMarkdown files containing annotated R functions, topic suggestions for a follow-up course to CDS-101, and a comparative discussion of two simulations.

• Homework

## Homework 5

For this homework assignment, you will be guided through the process of building a regression model that predicts the market value of condominiums in New York City using a dataset published by the New York City Department of Finance.

• Homework

## Homework 4

For this homework assignment, you will use statistical inference to answer a question about the National Survey of Family Growth, Cycle 6 dataset published by the National Center for Health Statistics.

R for Data Science

Discussion hashtag
#reading16

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 28th.

Introductory Statistics with Randomization and Simulation

• From chapter 5: from the beginning through to the end of section 5.1.4, section 5.4.1

R for Data Science

Discussion hashtag
#reading15

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, April 26th.

Nature News Feature article

Instead of posting a question as we’ve done for the other readings, please respond to the following prompts:

1. Had you ever heard of this situation concerning p-values before this class?

• If this is the first time you’ve heard this, did you find this surprising, and does it affect how you feel about science? Explain.

2. Based on the article, what practical things can we do to make sure our claims are accurate and transparent? Mention any quantities that we should compute and what kinds of details we should try to include in our RMarkdown notebooks.

Students that write a full and thoughtful response that addresses both prompts will receive both a question and an answer credit. A full response consists of a minimum of two paragraphs, one for the first prompt and one for the second prompt. Each paragraph must have a minimum of three full sentences, and the content must be substantive. Posts that don’t fulfill these criteria will only be eligible for a question credit.

Discussion hashtag
#reading14

Introductory Statistics with Randomization and Simulation

• From chapter 2: section 2.3 through to the end of section 2.5

• From chapter 4: section 4.5 (skip 4.5.3)

Discussion hashtag
#reading13

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 19th.

• Homework

## Homework 3

For this homework assignment, you will practice using the SelectorGadget Chrome extension to find the CSS selectors needed to scrape information from a webpage and use the rvest package to scrape data from the official Mason Patriots sports website.

Introductory Statistics with Randomization and Simulation

• From chapter 1: sections 1.3 (skip 1.3.4), 1.4.1, and 1.5

Writeups

Discussion hashtag
#reading12

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Sunday, April 15th.

Introductory Statistics with Randomization and Simulation

• From chapter 2: from the beginning through to the end of section 2.2

Discussion hashtag
#reading11

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 14th.

Writeups

Read the following writeups on the probability mass function and cumulative distribution function:

Discussion hashtag
#reading10

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, April 12th.

Tutorials

Read the following tutorials on the rvest package and SelectorGadget Chrome extension.

Beginner’s Guide on Web Scraping in R (using rvest) with hands-on example

Vignette

Discussion hashtag
#reading9

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 7th.

• Project

## Midterm Project

For the midterm, you will conduct an exploratory data analysis of the U.S. Department of Education’s College Scorecard dataset in teams.

• Homework

## Homework 2

For your second homework assignment, you will explore a dataset about the passengers on the Titanic, the British passenger liner that crashed into an iceberg during its maiden voyage and sank early in the morning on April 15, 1912.

R for Data Science

Discussion hashtag
#reading8

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, March 3rd.

R for Data Science

Discussion hashtag
#reading7

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, March 1st.

• Homework

## Homework 1

Your first major assignment is a set of exercises based around a single dataset called rail_trail, which will provide you with practice in creating visualizations using R and ggplot2.

R for Data Science

Discussion hashtag
#reading6

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 24th.

R for Data Science

Discussion hashtag
#reading5

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 22th.

R for Data Science

Discussion hashtag
#reading4

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 15th.

• Mini-Assignment

## Visualization mini-assignment

Mini-assignment to practice using RStudio to run code blocks in RMarkdown files and to create visualizations using ggplot2.

R for Data Science

Discussion hashtag
#reading3

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 10th.

Introductory Statistics with Randomization and Simulation

• All of Chapter 1, except skip sections 1.3 (read subsection 1.3.4), 1.4, and 1.5

Discussion hashtag
#reading2

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 8th.

• Mini-Assignment

## RMarkdown mini-assignment

Mini-assignment to practice editing RMarkdown files and saving to Github.

R for Data Science

Discussion hashtag
#reading1

Remember to post your question about it to the #5-discussion channel in Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 3rd.

• Mini-Assignment

## Try R Tutorial

Instructions

Complete all the levels of the Try R tutorial on codeschool.com before class begins on Tuesday, January 30th. After you complete the interactive tutorial, you will receive a certificate of completion.

It is recommended that you sign up for an account before starting, as this will let you save your progress.

Submission

Take a desktop screenshot of the certificate (Print Screen button) and send it to Dr. Glasbrenner as a Slack Direct Message. The screenshot should show some sort of identifiable information. For example, open a small notepad window and type your name there, like this:

• Mini-Assignment

## Introduce yourself; Twitter Data Science Study; Github signup

Introduce yourself

Write an introduction about yourself in the #3-members channel on Slack. Include your name, your major, and say one thing you know or have heard about data science before starting this class (this can be the news, in your major, etc.).

Finish reading the editorial and skimming the white paper from the Can Twitter predict election results? activity we started during class on January 23rd. Then, post your answer to question 1 in the #5-discussion channel on Slack, using the hashtag #class01 somewhere in your message.
Sign up for an account on Github: http://github.com using your @gmu.edu email address. After you signup, send me your username in a Direct Message.