XGBoost applied to Fashion MNIST

Now let’s consider applying XGBoost to Fashion MNIST dataset. As well as in 2 previous posts about XGBoost data are eready to use, and do not require any additional preprocessing in order to get accuracy near 90%. This makes this case similar to previous two (Iris and MNIST).

XGBoost applied to MNIST

Continue playing with XGBoost. Today let’s apply it to MNIST dataset. In fact, there is very small difference between applying XGBoost to Iris or to MNIST. I will not comment much here, and just give here a python script with the solution. If something is unclear, go read my previous post about XGboost and Iris.

XGBoost applied to Iris dataset

MNIST digit recognition with CNN and Keras

In this post I will briefly go through application of CNN (Convolutional Neural Networks) to well known MNIST dataset. I will use Keras for this. There is a well-known example at Keras repo: mnist_cnn.py, and I will use its code for this blog post. So there is nothing new in this blog post. Rather it is a try to put some basics into my head for further use.

Kaggle What's cooking competition

Solving Kaggle’s amazing What’s cooking competition using simple Bag of Words model and coding it by hands, without usage of any machine learning library.

Run Jupyter notebooks with Docker

Let me show the way I am running Jupyter notebooks on my laptop. No surprise, I am doing using Docker. Why? Because this is fast, convenient, platform-agnostic and allows to have clean host system.

Log loss metric explained

LogLoss is a classification metric based on probabilities. It measures the performance of a classification model where the prediction input is a probability value between 0 and 1. For any given problem, a smaller LogLoss value means better predictions.

Great Github and Reddit resources for machine learning

Here are links to 5 Github repositories and 2 Reddit discussions devoted to Machine learning. Actually, this is mostly copy of the original post on Analitcs Vidhya: 5 Amazing Machine Learning GitHub Repositories & Reddit Threads from September 2018. Gihub repos 1. Papers with code Link: https://github.com/zziz/pwc List of research articles (links to original PDFs are given). Each article is accompanied by sorce code of the software, so it should be not hard to understand the implementation of the algorithms.

Useful ML articles

I have got a list of useful ML articles from ODS slack (message link).

Pandas dataset analysis example

How to standout as an entry level data scientist candidate