Matthews’s correlation coefficient: Definition, Formula and advantages correlation coefficient

Matthews’s correlation coefficient: Definition, Formula and advantages


Table of Contents

What is Matthew’s correlation coefficient?

Matthew’s correlation coefficient, also abbreviated as MCC was invented by Brian Matthews in 1975. MCC is a statistical tool used for model evaluation. Its job is to gauge or measure the difference between the predicted values and actual values and is equivalent to chi-square statistics for a 2 x 2 contingency table.

Transform your insight generation process

Create an actionable feedback collection process.

online survey

Matthew’s correlation coefficient formula

MCC is a best single-value classification metric which helps to summarize the confusion matrix or an error matrix. A confusion matrix has four entities:

  • True positives (TP)
  • True negatives (TN)
  • False positives (FP)
  • False negatives (FN)

And is calculated by the formula:

Matthews’s correlation coefficient: Definition, Formula and advantages correlation coefficient

If the prediction returns good rates for all four of these entities, it is said to be a reliable measure producing high scores. And to suit most correlation coefficients, MCC also ranges between +1 and -1 as:

  • +1 is the best agreement between the predicted and actual values.
  • 0 is no agreement. Meaning, prediction is random according to the actuals

Download Market Research Toolkit

Get market research trends guide, Online Surveys guide, Agile Market Research Guide & 5 Market research Template

Making the most of your B2B market research in 2021 PDF 3 s 1.png

Example of MCC

Confusion matrix with entries: TP = 90, FP = 4; TN = 1, FN = 5. When we substitute these values in the formula we get 0.14

0.14 means the classifier is very close to a random guess classifier (0). 

Hence, it seems that the MCC helps us to identify the ineffectiveness of the classifier in classifying especially the negative class samples.

See Voxco survey software in action with a Free demo.

Advantages of MCC over F1 score


Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.

Results: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate that produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.

Conclusions: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

Explore all the survey question types
possible on Voxco

Read more