Cleaning MLB Statcast Data using pandas DataFrames and seaborn Visualization
A 64-page case study of instruction on cleaning MLB Statcast data in python, including exploratory visual analysis, missing data imputation, and outlier theory.
Unsupervised Learning
Introduction to Unsupervised Learning
K-Means Classification on the Iris Dataset
In this first module of unsupervised learning, we cover the basics of using scikit-learn.
t-SNE on the Iris Dataset with scikit-learn
Learn about one of the most fundamental concepts in data modeling: Dimension Reduction.
Non-Negative Matrix Factorization (NMF)
NMF Decomposition on LCD Digits Data with scikit-learn
Learn how to decompose images with NMF.
NMF Image Decomposer
Read about the function I wrote to automatically perform NMF image decomposition and display it.
NMF Decomposition and K-Means Clustering on tf-idf Wikipedia Text.
Venture into Natural Language Processing and create a simple recommendation algorithm for Wikipedia articles.
Principal Component Analysis (PCA)
Principal Component Analysis on the Iris Dataset
An introduction to PCA for dimension reduction.
Principal Component Analysis on the Statcast Homeruns Data
A real example of PCA in action.
TruncatedSVD Decomposition and K-Means Classification on tf-idf Data
Perform a similar function to PCA on tf-idf text data with TruncatedSVD.
Deep Learning
MLP Deep Learning on the MNIST Dataset
In this introduction to Deep Learning, we’ll walk through a step-by-step guide on the most fundamental type of neural network, Multilayer Perceptron (MLP).
CNN Deep Learning on the MNIST Dataset
We build on our previous guide by walking through the most successful image recognition network model, Convolutional Neural Networks (CNN).
Supervised Learning
KNN Classification on the Iris Dataset with scikit-learn
For those who have no experience with machine learning, this is the place to start. We walk through the basics and learn how to classify Iris flower species with the KNN algorithm.
Logistic Regression on the Diabetes Dataset with pipelines and Cross-Validation
Learn how to utilize logistic regression to model data with binary outcomes. Then learn how to optimize your models with pipelines and cross-validation.
Linear Regression Analysis on Statcast Data with scikit-learn
Learn the basics of linear regression and test them out on Statcast Homeruns data.
Data Science in R
Linear Modeling in R
Learn how to use ANOVA, Box-Cox transformations, and model assumption tests to model data.