In this assignment, you’ll have the opportunity to work with the New York Times COVID data (Links to an external site.) site.)to explore health trends in the U.S.. You’ll use your skills acquired thus far in the course to asses how COVID has had different impacts across the nation at the county, state, and national level. In doing so, you’ll achieve the following learning objectives:

  • Answer questions using data by filtering it down and selecting specific features
  • Computing summary statistics on the datasets
  • Aggregating and joining the data

In essence, you’ll be demonstrating mastery of data wrangling using the DPLYR package. This type of data manipulation and analysis is one of the largest components of any data science project, so mastering these skills is key.


As with previous assignments, follow this link (Links to an external site.) to create your own private repository for this assignment. This will automatically create a private repository which you will submit to Canvas as your assignment. Then, complete the steps in the and analysis.R files (note, you’ll need to follow instructions from both files!).


Between the and analysis.R files, there are roughly 50 prompts/instructions, each worth 2 points. Where appropriate, partial credit will be awarded, and in alignment with our course grading policy, any previously attempted (but incorrect) work can be corrected for full credit within one week of getting your grade. Skipped steps cannot be re-attempted, so make sure to give everything a genuine try!


Once you’ve finished editing your analysis.r and files, use git to add and commit the changes you’ve made, and push those changes to your repository on GitHub. Please submit the URL of your GitHub Repository as you assignment submission on Canvas.

Use R studio and atom to complete the assignment