In lieu of a written final exam, you will construct a course portfolio of three RMarkdown files containing annotated R functions, topic suggestions for a follow-up course to CDS-101, and a comparative discussion of two simulations.
For this homework assignment, you will be guided through the process of building a regression model that predicts the market value of condominiums in New York City using a dataset published by the New York City Department of Finance.
For this homework assignment, you will use statistical inference to answer a question about the National Survey of Family Growth, Cycle 6 dataset published by the National Center for Health Statistics.
Read the following:
Reading discussion
Discussion hashtag
#reading16
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 28th.
Posting guidelines can be found in the Readings section of the syllabus.
Introductory Statistics with Randomization and Simulation
Read the following:
Read the following:
Reading discussion
Discussion hashtag
#reading15
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, April 26th.
Posting guidelines can be found in the Readings section of the syllabus.
Nature News Feature article
Read the following article about p values:
Reading discussion
Instead of posting a question as we’ve done for the other readings, please respond to the following prompts:
Had you ever heard of this situation concerning p-values before this class?
If this is the first time you’ve heard this, did you find this surprising, and does it affect how you feel about science? Explain.
If you have heard about this situation before, did the article change your perspective in any way? Explain.
Based on the article, what practical things can we do to make sure our claims are accurate and transparent? Mention any quantities that we should compute and what kinds of details we should try to include in our RMarkdown notebooks.
Students that write a full and thoughtful response that addresses both prompts will receive both a question and an answer credit. A full response consists of a minimum of two paragraphs, one for the first prompt and one for the second prompt. Each paragraph must have a minimum of three full sentences, and the content must be substantive. Posts that don’t fulfill these criteria will only be eligible for a question credit.
Discussion hashtag
#reading14
Posting guidelines can be found in the Readings section of the syllabus.
Introductory Statistics with Randomization and Simulation
Read the following:
From chapter 2: section 2.3 through to the end of section 2.5
From chapter 4: section 4.5 (skip 4.5.3)
Reading discussion
Discussion hashtag
#reading13
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 19th.
Posting guidelines can be found in the Readings section of the syllabus.
For this homework assignment, you will practice using the SelectorGadget Chrome extension to find the CSS selectors needed to scrape information from a webpage and use the rvest package to scrape data from the official Mason Patriots sports website.
Introductory Statistics with Randomization and Simulation
Read the following:
Writeups
Reading discussion
Read the following writeups that supplement the content from reading 10:
Discussion hashtag
#reading12
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Sunday, April 15th.
Posting guidelines can be found in the Readings section of the syllabus.
Introductory Statistics with Randomization and Simulation
Read the following:
Reading discussion
Discussion hashtag
#reading11
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 14th.
Posting guidelines can be found in the Readings section of the syllabus.
Writeups
Read the following writeups on the probability mass function and cumulative distribution function:
Reading discussion
Discussion hashtag
#reading10
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, April 12th.
Posting guidelines can be found in the Readings section of the syllabus.
Tutorials
Read the following tutorials on the rvest
package and SelectorGadget
Chrome extension.
Beginner’s Guide on Web Scraping in R (using rvest) with hands-on example
SelectorGadget
Vignette
Reading discussion
Discussion hashtag
#reading9
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, April 7th.
Posting guidelines can be found in the Readings section of the syllabus.
For the midterm, you will conduct an exploratory data analysis of the U.S. Department of Education’s
For your second homework assignment, you will explore a dataset about the passengers on the Titanic, the British passenger liner that crashed into an iceberg during its maiden voyage and sank early in the morning on April 15, 1912.
Read the following:
Reading discussion
Discussion hashtag
#reading8
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, March 3rd.
Posting guidelines can be found in the Readings section of the syllabus.
Read the following:
Reading discussion
Discussion hashtag
#reading7
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, March 1st.
Posting guidelines can be found in the Readings section of the syllabus.
Your first major assignment is a set of exercises based around a single dataset called rail_trail
, which will provide you with practice in creating visualizations using R and ggplot2
.
Read the following:
Reading discussion
Discussion hashtag
#reading6
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 24th.
Posting guidelines can be found in the Readings section of the syllabus.
Read the following:
Reading discussion
Discussion hashtag
#reading5
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 22th.
Posting guidelines can be found in the Readings section of the syllabus.
Read the following:
Reading discussion
Discussion hashtag
#reading4
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 15th.
Posting guidelines can be found in the Readings section of the syllabus.
Mini-assignment to practice using RStudio to run code blocks in RMarkdown files and to create visualizations using ggplot2
.
Read the following:
Reading discussion
Discussion hashtag
#reading3
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 10th.
Posting guidelines can be found in the Readings section of the syllabus.
Introductory Statistics with Randomization and Simulation
Read the following:
Reading discussion
Discussion hashtag
#reading2
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Thursday, February 8th.
Posting guidelines can be found in the Readings section of the syllabus.
Mini-assignment to practice editing RMarkdown files and saving to Github.
Read the following:
Reading discussion
Discussion hashtag
#reading1
Remember to post your question about it to the #5-discussion
channel in
Slack by the due date. To receive an answer credit, reply to a posted question no later than 11:59pm on Saturday, February 3rd.
Posting guidelines can be found in the Readings section of the syllabus.
Instructions
Complete all the levels of the Try R tutorial on codeschool.com before class begins on Tuesday, January 30th. After you complete the interactive tutorial, you will receive a certificate of completion.
It is recommended that you sign up for an account before starting, as this will let you save your progress.
Submission
Take a desktop screenshot of the certificate (Print Screen button) and send it to Dr. Glasbrenner as a Slack Direct Message. The screenshot should show some sort of identifiable information. For example, open a small notepad window and type your name there, like this:
Introduce yourself
Write an introduction about yourself in the #3-members
channel on
Slack. Include your name, your major, and say one thing you know or have heard about data science before starting this class (this can be the news, in your major, etc.).
Can Twitter predict election results?
Finish reading the editorial and skimming the white paper from the
Can Twitter predict election results? activity we started during class on January 23rd. Then, post your answer to question 1 in the #5-discussion
channel on
Slack, using the hashtag #class01
somewhere in your message.
Github account
Sign up for an account on Github:
http://github.com using your @gmu.edu
email address. After you signup, send me your username in a Direct Message.