Contents

Log loss metric explained

LogLoss is a classification metric based on probabilities. It measures the performance of a classification model where the prediction input is a probability value between 0 and 1. For any given problem, a smaller LogLoss value means better predictions.

Formula

In order to calculate LogLoss the classifier must assign a probability to each class rather than simply yielding the most likely class (class with the largest probability). LogLoss formula is as follows:

Let’s find out what every symbol in this formula means.

• N - is a number of samples (or instances)
• M - is a number of possible labels
• y_ij - takes value 1 if label j is the correct classification for instance i, and 0 otherwise
• p_ij - is the model probability of assigning label j to instance i

Example calculation

Let’s calculate LogLoss in 2 ways:

• employ excellent sklearn library
• calculate with pure Python without any libraries

Data

Let’s assume that we predict car maker. We have only three possible labels: `audi`, `bmw`, `tesla`. And we have 8 rows of data with known labels:

``````data = ['audi', 'tesla', 'tesla', 'bmw', 'audi', 'bmw', 'audi', 'tesla']
``````

And we have a classifier that for given data gives following probabilities:

``````probs = [
[0.6, 0.3, 0.1], [0.45, 0.45, 0.1], [0.50, 0.00, 0.50], [1.00, 0.00, 0.00],
[0.2, 0.6, 0.2], [0.10, 0.10, 0.8], [0.33, 0.33, 0.34], [0.30, 0.40, 0.30]
]
``````

Please notice that lengh of `probs` array is equal to length of `data` array, and each array in `probs` array has size 3 (that is equal to the number of all possible labels). Internal arrays in `probs` array are probabilities of labels: 1st item is for `audi`, 2nd item is for `bmw`, 3rd item is for `tesla`. E.g. consider first array: `[0.6, 0.3, 0.1]`. Here `0.6` is probability of `audi`, `0.3` - for `bmw` and `0.1` for `tesla`. Also notice, that labels are sorted alphabetically, and so are probabilities for them (I will rely on this fact when implementing our own function that calculates LogLoss).

Now let’s calculate LogLoss for this classifier.

LogLoss in sklearn

Here is how to use sklearn.metrics.log_loss for this:

``````from sklearn.metrics import log_loss
data = ['audi', 'tesla', 'tesla', 'bmw', 'audi', 'bmw', 'audi', 'tesla']
probs = [
[0.6, 0.3, 0.1], [0.45, 0.45, 0.1], [0.50, 0.00, 0.50], [1.00, 0.00, 0.00],
[0.2, 0.6, 0.2], [0.10, 0.10, 0.8], [0.33, 0.33, 0.34], [0.30, 0.40, 0.30]
]
ll = log_loss(data, probs)
print(ll)
``````

The result is `5.53374909081`.

Handy LogLoss calculation

Now let’s calculate LogLoss with pure Python, not using any fancy library. Here is calculation for each data instance, summing all it up finally:

``````from math import log
ll = [None]*8
EPS = 1e-15
ll[0] = 1*log(0.6)+0*log(0.3)+0*log(0.1)
ll[1] = 0*log(0.45)+0*log(0.45)+1*log(0.1)
ll[2] = 0*log(0.5)+0*log(EPS)+1*log(0.5)
ll[3] = 0*log(1.0)+1*log(EPS)+0*log(EPS)
ll[4] = 1*log(0.2)+0*log(0.6)+0*log(0.2)
ll[5] = 0*log(0.1)+1*log(0.1)+0*log(0.8)
ll[6] = 1*log(0.33)+0*log(0.33)+0*log(0.34)
ll[7] = 0*log(0.3)+0*log(0.4)+1*log(0.3)
logloss = -(1/8)*sum(ll)
print(logloss)
``````

The result will we the same as with `sklearn`: `5.53374909081`.

Let’s consider this calculation. You see that instead of zero I am using here a small value: `EPS`, equal to `1e-15`. This is because logarythm of `0.0` is minus infinity. To workaround this I am using reasonably small value: `1e-15`.

Now let’s write our own function to calculate LogLoss. Let’s split it to 2 fundtions:

• one is validating input data: `logloss_validate`
• another one is doing actual calculations, only if validation was successful: `logloss`
``````def logloss_validate(a_true, a_probs, eps):
# 1. Validate input variables
if len(a_true) != len(a_probs):
raise ValueError('Length of a_true does not match length of a_probs')
unique_labels = set(a_true)
# 2. Validate number of unique labels
if len(unique_labels) < 2:
raise ValueError('Should have at least 2 unique labels')
# 3. Validate eps value
if eps <= 0.0:
raise ValueError('Eps should be very small (near zero) positive value')
# 4. Validate probabilities
for item in a_probs:
# 4.1. Check that item in a_probs is iterable
try:
itemiter = iter(item)
except TypeError as te:
raise ValueError('Item in a_probs should be an iterable')
# 4.2. Check that length of item in a_probs is equal to the number of unique labels
if len(item) != len(unique_labels):
raise ValueError('Size of item in a_probs does not match number of unique labels')
for i in item:
if i < 0.0:
raise ValueError('Some items of a_probs have negative values')

def logloss(a_true, a_probs, eps=1e-15):
logloss_validate(a_true, a_probs, eps)
from math import log

result = 0.0
# uniqalize labels and sort them alphabetically
unique_sorted_labels = sorted(set(a_true))
label_map = {}
label_idx = 0
# put index of each unique sorted label into a map
# I will use this map to get this label probability by label value itself
for label in unique_sorted_labels:
label_map[label] = label_idx
label_idx = label_idx + 1
for true_label, probs in zip(a_true, a_probs):
true_label_idx = label_map[true_label]
# Here I rely on the fact that label probabilities are sorted alphabetically
prob = probs[true_label_idx]
if (prob < eps):
prob = eps
result = result + log(prob)
return -1/len(a_true)*result;
``````

Then, if we call `logloss(data, probs)` we will get the same result for LogLoss metric: `5.53374909081`.