A comprehensive ten-course specialisation that can be followed in its entirety – or you can choose to study each course separately. It focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner.
Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available.
You can access the course for free via www.coursera.org/jhu. This will allow you to explore the course, watch lectures, and participate in discussions for free. To be eligible to earn a certificate, you must either pay for enrollment or qualify for financial aid.
Each course in the Specialization is offered monthly.
Explore the data science process – An Introduction
Understand data science thinking
Probability and statistics in data science
Understand and apply confidence intervals and hypothesis testing
Working with data – Ingestion and preparation
Know the basics of data ingestion and selection
Data Exploration and Visualization
Know how to create and interpret basic plot types
Introduction to Supervised Machine Learning
Understand the basic concepts of supervised learning
The Data Scientist’s Toolbox
In this course you will get an introduction to the main tools and ideas in the data scientist’s toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with.
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language.
Getting and Cleaning Data
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats.
Exploratory Data Analysis
This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models.
This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
Statistical inference is the process of drawing conclusions about populations or scientific truths from data. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well.
Practical Machine Learning
The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
Developing Data Products
This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience.
Data Science Capstone
The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.
Coursera provides financial aid to learners who cannot afford the fee. You’ll need to complete this step for each course in the Specialization, including the Capstone Project.
Graeme Malcolm, Senior Content Developer, Microsoft Learning Experiences; trainer, consultant, and author, specializing in SQL Server and the Microsoft data platform. He is a Microsoft Certified Solutions Expert for the SQL Server Data Platform and Business Intelligence. After years of working with Microsoft as a partner and vendor, he now works in the Microsoft Learning Experiences team as a senior content developer, where he plans and creates content for developers and data professionals who want to get the best out of Microsoft technologies.
Steve Elston, Managing Director, Quantia Analytics, LLC; big data geek and data scientist, with over two decades of experience using R and S/SPLUS for predictive analytics and machine learning. He holds a PhD degree in Geophysics from Princeton University, and has led multi-national data science teams across various companies.
Cynthia Rudin, Associate Professor, MIT and Duke; leads the Prediction Analysis Lab at MIT, and is associated with the Computer Science and Artificial Intelligence Laboratory and the Sloan School of Management. She holds a PhD in applied and computational mathematics from Princeton University, and was previously, an associate research scientist at the Center for Computational Learning Systems at Columbia U.