guide

Cleaning MLB Statcast Data using pandas DataFrames and seaborn Visualization

A 64-page case study of instruction on cleaning MLB Statcast data in python, including exploratory visual analysis, missing data imputation, and outlier theory.

Go to project

Unsupervised Learning

Introduction to Unsupervised Learning

1

K-Means Classification on the Iris Dataset

In this first module of unsupervised learning, we cover the basics of using scikit-learn.

Go to guide

2

t-SNE on the Iris Dataset with scikit-learn

Learn about one of the most fundamental concepts in data modeling: Dimension Reduction.

Go to guide

3

t-SNE on the Statcast Homerun Data

We test our t-SNE techniques on the Statcast Data.

Go to example

Non-Negative Matrix Factorization (NMF)

4

NMF Decomposition on LCD Digits Data with scikit-learn

Learn how to decompose images with NMF.

Go to guide

3

NMF Image Decomposer

Read about the function I wrote to automatically perform NMF image decomposition and display it.

Go to function

2 Photo credit: Wikipedia

NMF Decomposition and K-Means Clustering on tf-idf Wikipedia Text.

Venture into Natural Language Processing and create a simple recommendation algorithm for Wikipedia articles.

Go to guide

Principal Component Analysis (PCA)

4

Principal Component Analysis on the Iris Dataset

An introduction to PCA for dimension reduction.

Go to guide

3

Principal Component Analysis on the Statcast Homeruns Data

A real example of PCA in action.

Go to example

2

TruncatedSVD Decomposition and K-Means Classification on tf-idf Data

Perform a similar function to PCA on tf-idf text data with TruncatedSVD.

Go to guide

Deep Learning

2

MLP Deep Learning on the MNIST Dataset

In this introduction to Deep Learning, we’ll walk through a step-by-step guide on the most fundamental type of neural network, Multilayer Perceptron (MLP).

Go to guide

2 Photo credit: James Le

CNN Deep Learning on the MNIST Dataset

We build on our previous guide by walking through the most successful image recognition network model, Convolutional Neural Networks (CNN).

Go to guide

Supervised Learning

4

KNN Classification on the Iris Dataset with scikit-learn

For those who have no experience with machine learning, this is the place to start. We walk through the basics and learn how to classify Iris flower species with the KNN algorithm.

Go to guide

3

Logistic Regression on the Diabetes Dataset with pipelines and Cross-Validation

Learn how to utilize logistic regression to model data with binary outcomes. Then learn how to optimize your models with pipelines and cross-validation.

Go to guide

2

Linear Regression Analysis on Statcast Data with scikit-learn

Learn the basics of linear regression and test them out on Statcast Homeruns data.

Go to guide

Data Science in R

3 Photo credit: Tidyverse.org

Linear Modeling in R

Learn how to use ANOVA, Box-Cox transformations, and model assumption tests to model data.

Go to project