Measuring perceived usability: The CSUQ, SUS, and UMUX

Published: 24 Feb 2019
Updated: 14 Nov 2021
Written by: Chun Fei Lung

It’s a battle of standardised usability questionnaires.

Three scales, one cup

The perceived usability of an application can be measured using short questionnaires. Most practitioners will be familiar with the SUS, but it’s far from the only one. Lewis compared two other questionnaires with acronym-y names, the CSUQ and UMUX, with SUS and discovered that they have a lot in common (which is a good thing).

Why it matters

Usability consists of three components. Two of these, efficiency and effectiveness, are objective and can be directly measured by means of observation. The third, perceived usability, is subjective and therefore not as easy to quantify.

Usability practitioners therefore often rely on questionnaires, which can be used to get an estimate of the perceived usability. The SUS and CSUQ are currently the most popular usability questionnaires. But there’s a new contender: UMUX!

That’s a lot of acronyms, so let’s have a brief look at each of them:

The Computer System Usability Questionnaire (CSUQ) consists of 16 questions that can be answered on a 7-point Likert scale. The answers can be used to compute not just one, but four different scores between 1 and 7: an overall score, and scores for the system’s usefulness, information quality, and interface quality.
The System Usability Scale (SUS) is the most popular questionnaire by far. It consists of only 10 questions, that must be answered on a 5-point Likert scale. Answers can be used to compute a score between 0 and 100.

Because SUS usage is so widespread, practitioners and researchers can easily compare any system’s perceived usability with that of others, either by directly comparing the computed score or by mapping scores to grades (ranging from F to A+).
The Usability Metric for User Experience (UMUX) consists of only 4 questions, which must be answered on a 7-point Likert scale in order to obtain a score between 0 and 100.
UMUX-LITE is a variant of UMUX that consists of just two questions (side note: “But wait, there’s more! … Or less.”). Confusingly, there are actually two variants of this light variant. The first results in a score between 0 and 100, but appears to be slightly biased. The second variant, UMUX-LITEr, results in scores that are closer to scores obtained using SUS, but can only take values between 22.9 and 87.9.

The study basically aims to investigate whether all these questionnaires measure the same thing. If that’s the case, we can compare results from different usability studies, even if different questionnaires were used!

How the study was conducted

The author created a survey that included the CSUQ, SUS, and UMUX (in that order), and some additional questions about the system’s and respondent’s backgrounds.

The survey was completed by 746 IBM employees, all of whom were residents of the United States.

What discoveries were made

Results from the four questionnaires are fairly similar to each other, which suggests that they all measure roughly the same thing.

Reliability and validity

Cronbach’s α values for all four questionnaires are above 0.70 (CSUQ: 0.97; SUS: 0.93; UMUX: 0.88; UMUX-LITE(r): 0.79). This suggests that they’re all reliable enough.

Correlations between CSUQ, UMUX and UMUX-LITE(r), and SUS are 0.76, 0.79, and 0.74 respectively: the questionnaires appear to be fairly aligned with each other (side note: I’m glossing over the details here. Read the original article if you want to learn more about this.).

SUS and CSUQ

To compare the CSUQ with the SUS, the CSUQ’s score is convered to the 100-point scale that’s used by SUS, such that CSUQ’s highest, lowest, and mean scores (1, 7, and 4 respectively) map to SUS’s 100, 0, and 50.

After conversion the overall mean CSUQ score is 66.7, which is 2 points lower than the mean SUS score (68.7).

The large sample size makes this a statistically significant difference. When the Sauro–Lewis curved grading scale is used, both scores map to a C, i.e. there’s no _practically+ significant difference.

Interestingly, scores from macOS users (CSUQ: 76.6 or B; SUS: 76.8 or B) are significantly higher than scores from Windows users (CSUQ: 64.1 or C-; SUS: 66.9 or C).

SUS and UMUX

Results are fairly similar for UMUX and UMUX-LITE(r):

OS	SUS	UMUX	UMUX-LITE	UMUX-LITEr
macOS	76.8 (B)	79.2 (A-)	79.9 (A-)	74.9 (B)
Windows	66.9 (C)	66.5 (C)	68.5 (C)	67.4 (C)

Note that UMUX-LITEr’s scores are consistently close to those obtained using the SUS.

Summary

CSUQ, SUS, or UMUX are all alright, so those who already use one of them can keep doing that
SUS is widely used in studies, which makes it a safe choice for those who currently do not use SUS, CSUQ or UMUX yet
CSUQ is the best choice if you need a multidimensional instrument
Use UMUX-LITEr if you need to ask as few questions as possible
Be aware of bias: macOS users may be more positive about a system’s usability than Windows users

Title	Measuring perceived usability: The CSUQ, SUS, and UMUX
Year	2018
Author(s)	James R. Lewis (IBM Corporation)
Venue	International Journal of Human–Computer Interaction