CS109 Data Science – Course by Harvard University

Level: Beginner
Duration: NA
Delivery: Online
Certification: N/A
Cost: Free
Course Provider: Harvard University


As more data sets become available, it offers opportunities to statisticians and computer scientists to help gather, analyse, and interpret the data collected. Harvard University, through the John A. Paulson School of Engineering and Applied Sciences (SEAS), offer free access to their CS109 course on Data Science through their Github page.

Training Course Content

By taking the course, students will learn how to use data to gain useful predictions and insights. Specifically, they will learn data wrangling, cleaning, and sampling; data management; exploratory data analysis; prediction-based statistical methods; and communication of results. Moreover, students will learn Python and other tools to handle data, data management techniques, statistical methods, apply statistics and computational analysis to make predictions, and more.

The course will be divided into three major modules. Each one will focus on an area in which data plays an important role: prediction and elections, recommendation and business analytics, clustering and text analysis.

Harvard University’s data science course is divided into several components: lectures, sections, homework, and project. All of these are available on the course website and are spread out over 13 weeks. The following is an overview of the topics discussed per week:

  • Week 1: Course overview
  • Week 2: Pandas, Python, and Github; video scraping, regular expressions, data reshaping, data cleanup; exploratory data analysis
  • Week 3: Scraping, Pandas, Python, and viz; SQL; statistical models
  • Week 4: Probability, distributions, and frequentist statistics; storytelling and effective communication; bias and regression
  • Week 5: Regression, logistic regression; classification, cross-validation, dimensionality reduction
  • Week 6: Machine learning; decision trees and random forests
  • Week 7: Machine learning 2; ensemble methods, best practices
  • Week 8: Ensembles; best practices, recommendations, and MapReduce; MapReduce combiners and Spark
  • Week 9: Vagrant and VirtualBox, AWS, and Spark; Bayes theorem and Bayesian methods
  • Week 10: Bayes, interactive visualisation
  • Week 11: Text and clustering; effective presentations
  • Week 12: Projects, experimental design, deep networks
  • Week 13: Wrap-up

Who Is It For?

The course is open to undergraduates and graduate students who have programming knowledge of at least CS 50 level and statistics knowledge of at least Stat 100.


FREE via Harvard University’s Github page.

About the Provider

Founded in 1847 as the Lawrence Scientific School, SEAS currently boasts over 7,000 affiliated graduates. Their primary research interests are applied mathematics, applied physics, bioengineering, computer science, electrical engineering, environmental science and engineering, materials and mechanical engineering, and science, technology, and public policy. As a premier institution, Harvard is ranked #1 in mechanical engineering, #1 in biotechnology and applied microbiology, #5 in mathematics, and #3 in artificial intelligence, robotics, and auto control.

Rate this Article

All fields marked with red asterisks are required fields.

User Reviews

· November 1, 2018

The high quality content you'd expect from Harvard. Obviously just trying to get you in their sales funnel, but worth it nonetheless.

· November 8, 2018

Pretty strong course. Gave me a good intro to data science. Can recommend to any beginners looking to get started.

Your compare list