Mosaic Plot

SHARE THE ARTICLE ON

Data Analysis Plan1
Table of Contents

A mosaic plot is a sort of stacked bar chart that displays data percentages in groups. A contingency table is depicted graphically in the plot.

Mosaic plots are used to demonstrate connections and to compare groupings visually.

What Is A Mosaic Plot?

A mosaic plot (also referred as a Marimekko diagram) is a graphical way for visually representing data from two or more qualitative variables. It is a multidimensional version of spineplots, which graphically represent the same data for only one variable. It provides a summary of the data and allows for the identification of correlations between distinct variables. For example, independence is demonstrated when all of the boxes in the same category have the same areas. Hartigan and Kleiner proposed mosaic plots in 1981, and Friendly elaborated on them in 1994. Because of its similarity to a Marimekko chart, mosaic plots are often known as Mekko charts. The area of the tiles, which is also called the bin size, is related to the number of observations inside that category, as with bar charts and spineplots.

EXAMPLE

A typical example of a mosaic plot incorporates data from Titanic passengers. This example’s data set has 2201 observations and three variables. The variables are as follows:

  • the individual’s gender (male / female)
  • the class (first, second, and third class, or crew) 
  • did this person escape the sinking (yes / no)?

The observations were gathered into the following table:

Gender

Survived

1st Class

2nd Class

3rd Class

Crew

Male

No

118

154

422

670

Yes

62

25

88

192

Female

No

4

13

106

3

Yes

141

93

90

20

Construction: 

The categorical variables are initially arranged in alphabetical order. The variables are then allocated to an axis. This data set’s sequence and categorization are shown in the table to the right. Another ordering will produce a different mosaic plot, indicating that the order of the variables is important in all multivariate plots.

We initially display “Gender” at the left edge of the first variable, which means we divide the data vertically into two blocks: the bottom (much smaller) one relates to females, while the top (much bigger) one refers to males. One can readily tell that around one-quarter of the passengers were female, with the remaining three-quarters being male.

Order

Variable

Axis

1.

Gender

Vertical

2.

Class

Horizontal

3.

Survived

Vertical

The top edge is then given the second variable “Class.” As a result, the four vertical columns represent the four values of that variable (1st, 2nd, 3rd, and crew). Because column width shows the proportional fraction of the relevant value on the population, these columns are varied in thickness. The crew is clearly the most male-dominated category, whereas third-class passengers are the most female-dominated group. The number of female crew members is likewise said to be small.

Finally, the third variable (“Survived”) is applied, this time along the left side, with the outcome underlined by shade: dark grey rectangles represent those who did not survive the calamity, whereas light grey rectangles represent people who did. Women in the first class are quickly shown to have had the best chance of surviving. Females appear to have had a higher survival probability than men (marginalized over all classes). Similarly, a gender marginalization identifies first-class passengers as the most likely to survive. In all, approximately one-third of all persons survived (proportion of light grey areas).

Exploratory Research Guide

Conducting exploratory research seems tricky but an effective guide can help.

Properties Of Mosaic Plot

  • The variables presented are either categorical or ordinal scales.
  • There are at least two variables in the plot. There is no upper limit, however having too many variables might be misleading in visual form.
  • The number of observations is not restricted; however, it is not visible in the picture.
  • The surfaces of the rectangular fields that are accessible for a given set of characteristics are proportional to the number of observations with that set of features.
  • The mosaic plot, unlike the boxplot or QQ plot, does not allow for the display of a confidence interval. As a result, the relevance of different frequencies of the various characteristic values cannot be seen visually.

Uses Of Mosaic Plot

Mosaic plots are useful when:

  • Relationships that are part-to-whole or part-to-part-to-whole should be stressed.
  • If required, exact values can be retrieved using another mechanism (e.g., a table).
  • Space is limited, thus comparisons of tiny multiples are reasonable.

Independence

A mosaic plot generally makes it clear whether two variables are independent. Because all proportions are the same when they are independent, the boxes line up in a grid. This approach is demonstrated using the UCB Admissions dataset included with R. The following is a graph of student admissions by gender:

It appears to be a gender prejudice. However, there is a hidden variable: the department to which you applied. What happens when we stratify by department?:

Most departments appear to be gender neutral, with those that are skewed favoring women. First, there are extremely few female candidates in departments A and B. (the columns are narrow). It is also very simple to get into such departments—the number of applicants who are denied is smaller than in other departments, particularly F. One possibility is that more men get in because they apply to the hungry, maybe fastest-growing, departments.

See Voxco survey software in action with a Free demo.

See Voxco survey software in action with a Free demo.

Residuance

Mosaic plots provide the data exactly as it is, with no attempt to generalize to the entire population. We require statistical significance metrics to make judgments about the population. We may define Pearson residuals, which are inspired by the chi-square test, to quantify each cell’s departure from independence. Because the units are in standard deviations, a residual more than 2 or less than -2 signifies a substantial deviation at the 95 percent level.

Here is a mosaic plot of hair color versus eye color in a group statistics student with residual shading.

The residuals can be viewed as follows: If we are certain that a cell is taller than the other cells in the same row, it is colored blue. If we are certain that a cell is shorter than the other cells in the same row, it gets colored red. If a cell is plainly short but does not turn red, there is insufficient data to establish that the cell would remain short if we obtained another sample. A blue cell is frequently followed by a red cell in the same row, although this is not always the case—-see, for example, the bottom row of the figure (green eyes). It’s worth noting that the shading says nothing about the relative heights of the boxes in the same column. 

Shading is unnecessary in a table with a lot of data because all differences are substantial and can be observed from the box heights. When boxes aren’t lined up, such as in the “hazel eyes” row, it might be difficult to compare heights. In addition, coloration draws your attention to the locations of the essential relationships.

Advantages Of Mosaic Plot

It provides a summary of the data and allows for the identification of correlations between distinct variables. For example, independence is demonstrated when all of the boxes in the same category have the same areas.

Disadvantages Of Mosaic Plot

  • It is tough to compare lengths or heights that are not aligned along a shared baseline.

For example, in the graph below, because the two highlighted rectangles are not aligned at a same baseline, comparing their heights is more difficult than if they were aligned along a single baseline.

  • Categorical things are frequently difficult to classify.
  • One variable is represented as the height of rectangles and the other as their widths, however it is difficult to focus on either heights or widths individually when they both fluctuate.
  • Comparisons of rectangle sizes are confounded by the fact that aspect ratios of rectangles can vary substantially.
Online survey tools 10 1

See why 450+ clients trust Voxco!

[fluentform id="10"]

By providing this information, you agree that we may process your personal data in accordance with our Privacy Policy.

Read more