XGBoost applied to Fashion MNIST

Now let’s consider applying XGBoost to Fashion MNIST dataset. As well as in 2 previous posts about XGBoost data are eready to use, and do not require any additional preprocessing in order to get accuracy near 90%. This makes this case similar to previous two (Iris and MNIST).

XGBoost applied to MNIST

Continue playing with XGBoost. Today let’s apply it to MNIST dataset. In fact, there is very small difference between applying XGBoost to Iris or to MNIST. I will not comment much here, and just give here a python script with the solution. If something is unclear, go read my previous post about XGboost and Iris.

XGBoost applied to Iris dataset

MNIST digit recognition with CNN and Keras

In this post I will briefly go through application of CNN (Convolutional Neural Networks) to well known MNIST dataset. I will use Keras for this. There is a well-known example at Keras repo: mnist_cnn.py, and I will use its code for this blog post. So there is nothing new in this blog post. Rather it is a try to put some basics into my head for further use.

Kaggle What's cooking competition

Solving Kaggle’s amazing What’s cooking competition using simple Bag of Words model and coding it by hands, without usage of any machine learning library.

Run Jupyter notebooks with Docker

Let me show the way I am running Jupyter notebooks on my laptop. No surprise, I am doing using Docker. Why? Because this is fast, convenient, platform-agnostic and allows to have clean host system.

Log loss metric explained

LogLoss is a classification metric based on probabilities. It measures the performance of a classification model where the prediction input is a probability value between 0 and 1. For any given problem, a smaller LogLoss value means better predictions.

Great Github and Reddit resources for machine learning

Here are links to 5 Github repositories and 2 Reddit discussions devoted to Machine learning. Actually, this is mostly copy of the original post on Analitcs Vidhya: 5 Amazing Machine Learning GitHub Repositories & Reddit Threads from September 2018. Gihub repos 1. Papers with code Link: https://github.com/zziz/pwc List of research articles (links to original PDFs are given). Each article is accompanied by sorce code of the software, so it should be not hard to understand the implementation of the algorithms.

Simple MySQL backup to Amazon S3 using Docker

This is just a short notice about a convenient was of making daily MySQL backups to Amazon S3. Docker is used here to spin up MySQL and AWS CLI tool.

Trie or Prefix Tree

Let’s consider another popular data structure: Trie or Prefix tree.

Binary search tree

Binary search tree (BST) is a data structure that allows fast element lookup, addition or removal of items.

Dynamic programming problems

Here is a list of all dynamic programming tasks from this blog.

Leetcode 303. Immutable range sum query

Leetcode 198. House robber

Leetcode 53. Maximum subarray

From a given array find the maximum contiguous subarray with the largest sum, Leetcode 53.

Leetcode 121. Best time to buy and sell stock

Leetcode 746. Min cost climbing stairs

Leetcode 70. Climbing stairs

How to run HUGO under Docker

It is convenient to run such tool like Hugo (and many many others) under Docker, without installing it to a local machine. Here I will show you how to run Hugo under Docker, create new posts for the blog, generate static content - i.e. all basic operations that allow to add new and edit existing posts to the blog.

Useful ML articles

I have got a list of useful ML articles from ODS slack (message link).

Pandas dataset analysis example

How to standout as an entry level data scientist candidate

Optimistic and pessimistic concurrency control

Here we consider two most widely used approaches to transactional locking: pessimistic and optimistic locking.

PostgreSQL transaction isolation levels

Details of transaction isolation in PostgreSQL.

Transaction isolation levels

Transaction isolation levels in relational databases. A reminder post.

Spring: destroy prototype beans

Prototype-scoped beans destruction is not managed by Spring container (only construction is managed). But we can manage it ourselves with Spring BeanPostProcessors.

Spring Bean PostProcessors

Spring BeanPostProcessor is a nice feature that gives you much power into hands. You can do a lot of fun things with BeanPostProcessors, and this post demonstrates some of them with a series of short examples.

Copy folder from AWS S3

How could we copy a folder from AWS S3 to local machine using AWS CLI?

Java objects memory size

Here we try to analyze how much memory will consume Java objects. All numbers are given for 64-bit machines.

Serialize list of dates into JSON with Jackson

There are a lot of examples about how to serialize Date objects into JSON. But I was not able to find examples that explain how to serialize a collection of dates: Collection<Date>. I had to dive into code and find out it myself. In this post I share my findings with you.

Spring Boot and AWT headless

The problem with AWT headless app in Spring Boot Recently I needed to create console Java Spring Boot app that does some stuff with AWT (not so important what exactly). I found that simple run with a normat Spring Boot stub simply does not work. Running this: @SpringBootApplication public class MyAwtApplication { public static void main(String[] args) { SpringApplication.run(MyAwtApplication.class, args); } } generates following exception when AWT-related processing happens:

AWS permissions to S3 folder

Let’s imagine that we have a project, which actually use AWS S3 as file storage. The project consists of 2 parts: one part puts files into S3, and the other part only reads them from S3. Moreover, files are stored not in the bucket root, but in some folder which is placed in the bucket root. According to best AWS practices, we need 2 security and access control policies. One policy is for that part of software which puts files into S3 folder.

How SSH get port 22

A nice story from SSH creator (Tatu Ylonen) about how SSH got its port 22. Briefly, initial SSH version was created in Spring 1995, time when FTP and Telnet were widely used. SSH was designed to replace FTP (port 21) and Telnet (port 23). Port 22 was free. So it was chosen to be used in SSH. At that time Internet was small, and port numbers were allocated by IANA, managed by Internet pioneers Jon Postel and Joyce K.

Interesting HackerNews posts by 21 April 2017

Recursive DNS Server Fingerprint They try to identify DNS hijacking by creating a database of DNS server fingerprints and comparing response of a particular DNS server (suspected to be hijacked) with its normal fingerprint. Introducing Token Token is a browser for the Ethereum network that provides universal access to financial services. It combines messaging app, Ethereum wallet and browser for Ethereum apps. Scalable, Lie-Detecting Timeserving with Roughtime Roughtime is a protocol designed to provide internet-scale secure time synchronization and address shortcomings of older protocols like NTP.

Blockchain and Bitcoin links

This posts contains links to some articles, blog posts and other materials which looks interesting for me, give good explanation/description of the field and simply put are pleasure to read. A blockchain in 200 lines of code The blockchain explained to web developers. Part 1: The Theory How migh we use blockchains outside cryptocurrencies Mastering Bitcoin - open source book on Bitcoin published by OReilly Awesome Bitcoin - GitHub repo with bitcoin-related links collection

Timus 1005. Stone pile

This is a first post which describes how to solve some particular programming task. Today we will consider Stone pile task from Timus. It is clear and easy. The task sounds like: Task Stone pile Task You have a number of stones with known weights w1, …, wn. Write a program that will rearrange the stones into two piles such that weight difference between the piles is minimal. Input Input contains the number of stones n (1 ≤ n ≤ 20) and weights of the stones w1, …, wn (integers, 1 ≤ wi ≤ 100000) delimited by white spaces.

First post

Hi, after a long pause I decided to relaunch my blog. Here I am planning to tell mostly about technical things, put some useful findings, etc.