Level: Beginner

Duration: 12 hours

Delivery: Online

Certification: Not Provided

Cost: 160

Course Provider: Frank Kane / Sundog

This comprehensive specialisation course includes over 80 lectures spanning 12 hours of video; most topics include Python code examples you can use for reference and for practice. You will need some programming or scripting experience.

This course will teach you the techniques used by real data scientists and machine learning practitioners in the tech industry – and prepare you for a move into this hot career path. The instructor draws on his 9 years of experience at Amazon and IMDb to guide you through what matters, and what doesn’t.

Topics will include data distributions, probability mass functions, and probability density functions; visualising data with matplotlib; using covariance and correlation metrics, using Bayes’ Theorem to identify false positives; making predictions using linear regression, polynomial regression, and multivariate regression; using train/test and K-Fold cross validation to choose the right model; using decision trees to predict hiring decisions, clustering data using K-Means clustering and Support Vector Machines (SVM), building recommender systems using item-based and user-based collaborative filtering, predicting classifications using K-Nearest-Neighbor (KNN), applying dimensionality reduction with Principal Component Analysis (PCA), implementing machine learning, clustering, and search using TF/IDF at massive scale with Apache Spark’s MLLib and more.

- Introduction
- Python Basics, Part 1
- Running Python Scripts
- Introducing the Pandas Library

**Statistics and Probability Refresher, and Python Practise**

- Types of Data
- Mean, Median, Mode
- Probability Density Function; Probability Mass Function
- Common Data Distributions
- [Activity] Percentiles and Moments
- [Activity] A Crash Course in matplotlib
- [Activity] Covariance and Correlation
- [Exercise] Conditional Probability
- Exercise Solution: Conditional Probability of Purchase by Age
- Bayes’ Theorem

**Predictive Models**

- [Activity] Linear Regression
- [Activity] Polynomial Regression
- [Activity] Multivariate Regression, and Predicting Car Prices
- Multi-Level Models

**Machine Learning with Python**

- Supervised vs. Unsupervised Learning, and Train/Test
- [Activity] Using Train/Test to Prevent Overfitting a Polynomial Regression
- Bayesian Methods: Concepts
- [Activity] Implementing a Spam Classifier with Naive Bayes
- K-Means Clustering
- Measuring Entropy
- [Activity] Install GraphViz
- Decision Trees: Concepts
- [Activity] Decision Trees: Predicting Hiring Decisions
- Ensemble Learning
- Support Vector Machines (SVM) Overview
- [Activity] Using SVM to cluster people using scikit-learn

**Recommender Systems**

- User-Based Collaborative Filtering
- Item-Based Collaborative Filtering
- [Activity] Finding Movie Similarities
- [Activity] Improving the Results of Movie Similarities
- [Activity] Making Movie Recommendations to People
- [Exercise] Improve the recommender’s results

**More Data Mining and Machine Learning Techniques**

- K-Nearest-Neighbors: Concepts
- [Activity] Using KNN to predict a rating for a movie
- Dimensionality Reduction; Principal Component Analysis
- [Activity] PCA Example with the Iris data se
- Data Warehousing Overview: ETL and ELT
- Reinforcement Learning

**Dealing with Real-World Data**

- Bias/Variance Tradeoff
- [Activity] K-Fold Cross-Validation to avoid overfitting
- Data Cleaning and Normalization
- [Activity] Cleaning web log data
- Normalizing numerical data
- [Activity] Detecting outliers

**Apache Spark: Machine Learning on Big Data**

- Warning about Java 9!
- [Activity] Installing Spark – Part 1
- [Activity] Installing Spark – Part 2
- Spark Introduction
- Spark and the Resilient Distributed Dataset (RDD)
- Introducing MLLib
- [Activity] Decision Trees in Spark
- [Activity] K-Means Clustering in Spark
- TF / IDF
- [Activity] Searching Wikipedia with Spark
- [Activity] Using the Spark 2.0 DataFrame API for MLLib

**Experimental Design**

- A/B Testing Concepts
- T-Tests and P-Values
- [Activity] Hands-on With T-Tests
- Determining How Long to Run an Experiment
- A/B Test Gotchas

**Deep Learning and Neural Networks**

- Deep Learning Pre-Requisites
- The History of Artificial Neural Networks
- [Activity] Deep Learning in the Tensorflow Playground
- Deep Learning Details
- Introducing Tensorflow
- [Activity] Using Tensorflow, Part 1
- [Activity] Using Tensorflow, Part 2
- [Activity] Introducing Keras
- [Activity] Using Keras to Predict Political Affiliations
- Convolutional Neural Networks (CNN’s)
- [Activity] Using CNN’s for handwriting recognition
- Recurrent Neural Networks (RNN’s)
- [Activity] Using a RNN for sentiment analysis
- The Ethics of Deep Learning
- Learning More about Deep Learning

**Final Project**

- Final project review

Some prior coding or scripting experience is required, and at least high school level math skills will be required. The course is geared toward software developers or programmers who want to transition into the lucrative data science career path, or data analysts in the finance or other non-tech industries who want to transition into the tech industry.

160

Frank Kane spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

All fields marked with red asterisks are required fields.