Commonly used evaluation indicators in machine learning

Category:

1. Classification issues
1.1 Confusion Matrix
Each row in the matrix represents the predicted category of the instance, and each column represents the true category of the instance.

 

True (True Positive, TP): Positive samples predicted to be positive by the model.
False Positive (FP): Negative samples predicted to be positive by the model.
False Negative (FN): Positive samples predicted to be negative by the model.
True Negative (TN): Negative samples predicted to be negative by the model.

True Positive Rate (TPR): TPR=TP/(TP+FN), that is, the number of positive samples predicted to be positive/the actual number of positive samples. Recall rate
False Positive Rate (FPR): FPR=FP/(FP+TN), that is, the number of negative samples predicted to be positive/the actual number of negative samples.
False Negative Rate (FNR): FNR=FN/(TP+FN), that is, the number of positive samples predicted to be negative/the actual number of positive samples.
True Negative Rate (TNR): TNR=TN/(TN+FP), that is, the number of negative samples predicted to be negative/the actual number of negative samples/2

 

1.1.1 sklearn corresponding packages
sklearn.metrics.confusion_matrix

from sklearn.metrics import confusion_matrix
# y_pred is the predicted label
y_pred, y_true =[1,0,1,0], [0,0,1,0]
confusion_matrix(y_true=y_true, y_pred=y_pred)
# array([[2, 1],
# [0, 1]], dtype=int64)

 

1.2 Accuracy
The proportion of correctly classified samples to the total samples.
1.2.1 sklearn corresponding packages
sklearn.metrics.accuracy_score

 

from sklearn.metrics import accuracy_score
# y_pred is the predicted label
y_pred, y_true=[1,2,3,4], [2,2,3,4]
accuracy_score(y_true=y_true, y_pred=y_pred)
# 0.75

1.3 Precision: The proportion of positive samples with correct predictions to all samples with positive predictions
The number of all positive samples that are correct/all the samples predicted to be positive.
Precision rate
1.3.1 sklearn corresponding packages
sklearn.metrics.precision_score

 

from sklearn.metrics import precision_score
# y_pred is the predicted label
y_pred, y_true =[1,0,1,0], [0,0,1,0]
precision_score(y_true=y_true, y_pred=y_pred)
# 0.5

 

1.4 Recall rate (Recall): The proportion of positive samples with correct predictions to all positive samples
All positive samples with correct score/number of all positive samples.
Recall rate
1.4.1 sklearn
sklearn.metrics.recall_score

 

from sklearn.metrics import recall_score
# y_pred is the predicted label
y_pred, y_true =[1,0,1,0], [0,0,1,0]
recall_score(y_true=y_true, y_pred=y_pred)
# 1.0

 

1.5 F1 score
Also known as the balance score, it is defined as the harmonic mean of precision and recall.
1.5.1 sklearn corresponding packages
sklearn.metrics.f1_score

from sklearn.metrics import f1_score
# y_pred is the predicted label
y_pred, y_true =[1,0,1,0], [0,0,1,0]
f1_score(y_true=y_true, y_pred=y_pred)

# classification_report can directly output the precision recall f1-score support of each class
from sklearn.metrics import classification_report
# y_pred is the predicted label
y_pred, y_true =[1,0,1,0], [0,0,1,0]
print(classification_report(y_true=y_true, y_pred=y_pred))

 

1.6 Gain (Gain) and Lift (Lift) graph
1.7 ROC curve
Horizontal axis: the specificity of the negative positive rate (false postive rate FPR=FP/(FP+TN)), which divides the proportion of all negative cases in the instance to all negative cases; (1-Specificity)
Vertical axis: true postive rate (true postive rate TPR=TP/(TP+FN)) sensitivity, Sensitivity (positive category coverage), that is, recall rate
1.7.1 sklearn corresponding packages
sklearn.metrics.roc_curve, sklearn.metrics.auc

 

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
# y_test: actual label, dataset_pred: predicted probability value.
fpr, tpr, thresholds = roc_curve(y_test, dataset_pred)
roc_auc = auc(fpr, tpr)
#Drawing, only need plt.plot(fpr,tpr), the variable roc_auc just records the value of auc, which can be calculated by the auc() function
plt.plot(fpr, tpr, lw=1, label=’ROC(area = %0.2f)’% (roc_auc))
plt.xlabel(“FPR (False Positive Rate)”)
plt.ylabel(“TPR (True Positive Rate)”)
plt.title(“Receiver Operating Characteristic, ROC(AUC = %0.2f)”% (roc_auc))
plt.show()

 

1.8 AUC (Area Under Curve)
AUC is the area under the ROC curve (the integral of ROC), usually greater than 0.5 and less than 1.
The larger the AUC value (area) of the classifier, the better the performance.
1.8.1 sklearn corresponding packages
sklearn.metrics.roc_auc_score

 

from sklearn.metrics import roc_auc_score
# y_test: actual label, dataset_pred: predicted probability value.
roc_auc_score(y_test, dataset_pred)

 

1.9 PR curve
Abscissa: accuracy rate P
Y-coordinate: Recall rate R
The evaluation criteria are the same as ROC, first look at whether it is smooth or not (the blue line is obviously better). Generally speaking, in the same test set, the upper line is better than the lower line.
When the values of P and R are close, the value of F1 is the largest.
1.10 Multiple categories
precision_recall_fscore_support: Calculate the precision, recall, fscore and support of each category
2. Regression problem
In sklearn, usually the function returns a value at the end of _score to maximize, the higher the better; the function _error or _loss returns a value at the end to minimize (minimize), the lower the better.
2.1 Mean absolute error (MAE)
Mean Absolute Error (MAE) is also called l1
The average absolute error is a non-negative value, and the better the model, the closer the MAE to zero.
formula
2.1.1 sklearn corresponding package
sklearn.metrics.mean_absolute_error

 

from sklearn.metrics import mean_absolute_error

y_true, y_pred = [3, -0.5, 2, 7], [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)
# 0.5

 

2.2 Mean Square Error (MSE)
Mean Squared Error (MSE) is also called l2
The essence is to divide the total sample size on the basis of the residual sum of squares (RSS), and get the average error of each sample size.
The mean square error is a non-negative value, the better the model, the closer the MSE is to zero.
formula
2.2.1 sklearn corresponding package
sklearn.metrics.mean_squared_error

 

from sklearn.metrics import mean_squared_error

y_true, y_pred = [3, -0.5, 2, 7], [2.5, 0.0, 2, 8]
mse = mean_squared_error(y_true, y_pred)
# 0.375
rmse = np.sqrt(mse)
# 0.6123724356957945

 

2.3 Root Mean Square Error (RMSE)
The root mean square error RMSE (Root Mean Squared Errort), that is, the square root of MSE.
formula
2.4 Mean square logarithmic error (MSLE)
MSLE (mean squared logarithmic error)
The mean square log error is a non-negative value, and the better the model, the closer the MSLE is to zero.
formula
2.4.1 sklearn corresponding package
sklearn.metrics.mean_squared_log_error
2.5 Median absolute error (MedAE)
Median absolute error MedAE (median absolute error)
The median absolute error is non-negative, and the better the model, the closer the MSE is to zero.
formula
2.5.1 sklearn corresponding package
sklearn.metrics.mean_squared_log_error
2.5 Interpretable variance score (EVS)
Explained variance is calculated based on the variance of the error.
The interpretable variance value of the best model is 1, and the smaller the model, the smaller the value.
formula:
2.5.1 sklearn related packages
sklearn.metrics.explained_variance_score
2.6 Coefficient of Determination
The R2 coefficient of determination (r2_score) determines the degree of fit of the regression equation.
The R^{2} determination coefficient score value of the best model is 1, the constant model value is 0, and the worse the model, the smaller the value.
formula
2.6.1 sklearn related packages
sklearn.metrics.r2_score

 

from sklearn.metrics import r2_score
y_true, y_pred = [3, -0.5, 2, 7], [2.5, 0.0, 2, 8]
r2_score(y_true, y_pred)

 

3. Clustering problem
For clustering results, the pursuit of “intra-cluster similarity” (intra-cluster similarity) is high, and “inter-cluster similarity” (inter-cluster similarity) is low.
There are roughly two types of clustering performance metrics:
Compare the clustering results with a “reference model”, called “external index”.
Refer directly to the clustering results without using any reference model, which is called “internal index”.
3.1 External indicators
For the data set, it is assumed that the clusters given by clustering are divided into, and the clusters given by the reference model are divided into. Correspondingly, let and denote the cluster label vectors corresponding to and respectively, and consider the samples in pairs, and define
among them
Set SS: Contains sample pairs that belong to the same cluster in and also belong to the same cluster in
Set SD: Contains sample pairs that belong to the same cluster in but also belong to different clusters in
Set DS: Contains sample pairs that belong to different clusters in and belong to the same cluster in
Set DD: Contains sample pairs that belong to different clusters in and at the same time belong to different clusters in
Since each sample pair can only appear in one set, there is
3.1.1 Commonly used external indicators
Jacccard Coeffient (Jaccard Coeffient, JC)
FM Index (Fowlkes and Mallows Index, referred to as FMI)
Rand Index (Rand Index, RI)
The results of the above performance indexes are all in the range, and the larger the value, the better.
3.1.2 Mutual Information
Mutual Information (MI) or transformation information (transinformation) of two random variables is a measure of the interdependence between variables.
3.1.3 sklearn corresponding package
FMI: fowlkes_mallows_score
RI: sklearn.metrics.adjusted_rand_score
MI: sklearn.metrics.adjusted_mutual_info_score
3.2 Internal indicators
Consider the cluster division of the clustering results, with the following definitions
Is the average distance between samples in the cluster
Corresponding to the farthest distance between samples in the cluster
Corresponds to the distance between the cluster and the nearest sample of the cluster
The distance between the corresponding cluster and the center point of the cluster
3.2.1 Commonly used internal indicators
DB Index (Davies-Bouldin Index, DBI for short) The possible minimum value of DBI is 0, and the smaller the value, the better.
Dunn Index (Dunn Index, DI) The larger the DI value, the better
3.2.2 Silhouette coefficient (Silhouette coefficient)
It combines the Cohesion and Separation of clustering to evaluate the effect of clustering. The value is between (-1, 1).
The closer the value is to 1, the sample is very similar to the samples in the cluster where it is located, and is not similar to the samples in other clusters; when the sample point is more similar to the samples outside the cluster, the contour coefficient is negative; when the contour coefficient When it is 0, it means that the sample similarity in the two clusters is the same, and the two clusters should be one cluster.
The formula a(i) is the average distance between sample i and other samples in the cluster, b(i) is the average distance between sample i and some other cluster samples, and multiple clusters b(i) are the smallest.
3.2.3 sklearn corresponding package
DBI: sklearn.metrics.davies_bouldin_score
sklearn.metrics.silhouette_score, the return is the mean value of the contour coefficients of all samples in a data set.
sklearn.metrics.silhouette_score_samples, its parameters are consistent with the contour coefficient, but it returns the contour coefficient of each sample in the data set.
4. Related issues
4.1 Support
Represents the probability that items X and Y appear in the total data set at the same time, and the calculation formula is refers to the proportion of transaction records where X and Y appear at the same time in N transaction records.
4.2 Confidence
Refers to the probability that the subsequent item Y will also occur when the leading item X has occurred, that is, the proportion of transaction records that include X and also include Y. The calculation formula is:
4.3 Lift
It represents the ratio of the probability of containing X and the probability of containing Y, whether it contains X or not, and the ratio of the probability of containing Y. In the case of buying X, the probability of buying Y is greater than the probability of buying Y, which has an improving effect.

Reviews

There are no reviews yet.

Be the first to review “Commonly used evaluation indicators in machine learning”

Your email address will not be published. Required fields are marked *