Chuniversiteit logomarkChuniversiteit.nl
The Toilet Paper

Measuring perceived usability: The CSUQ, SUS, and UMUX

It’s a battle of standardised usability questionnaires.

Contest participants with scales of varying lengths
Three scales, one cup

The perceived usability of an application can be measured using short questionnaires. Most practitioners will be familiar with the SUS, but it’s far from the only one. Lewis compared two other questionnaires with acronym-y names, the CSUQ and UMUX, with SUS and discovered that they have a lot in common (which is a good thing).

Why it matters

Link

Usability consists of three components. Two of these, efficiency and effectiveness, are objective and can be directly measured by means of observation. The third, perceived usability, is subjective and therefore not as easy to quantify.

Usability practitioners therefore often rely on questionnaires, which can be used to get an estimate of the perceived usability. The SUS and CSUQ are currently the most popular usability questionnaires. But there’s a new contender: UMUX!

That’s a lot of acronyms, so let’s have a brief look at each of them:

  • The Computer System Usability Questionnaire (CSUQ) consists of 16 questions that can be answered on a 7-point Likert scale. The answers can be used to compute not just one, but four different scores between 1 and 7: an overall score, and scores for the system’s usefulness, information quality, and interface quality.

  • The System Usability Scale (SUS) is the most popular questionnaire by far. It consists of only 10 questions, that must be answered on a 5-point Likert scale. Answers can be used to compute a score between 0 and 100.

    Because SUS usage is so widespread, practitioners and researchers can easily compare any system’s perceived usability with that of others, either by directly comparing the computed score or by mapping scores to grades (ranging from F to A+).

  • The Usability Metric for User Experience (UMUX) consists of only 4 questions, which must be answered on a 7-point Likert scale in order to obtain a score between 0 and 100.

  • UMUX-LITE is a variant of UMUX that . Confusingly, there are actually two variants of this light variant. The first results in a score between 0 and 100, but appears to be slightly biased. The second variant, UMUX-LITEr, results in scores that are closer to scores obtained using SUS, but can only take values between 22.9 and 87.9.

The study basically aims to investigate whether all these questionnaires measure the same thing. If that’s the case, we can compare results from different usability studies, even if different questionnaires were used!

How the study was conducted

Link

The author created a survey that included the CSUQ, SUS, and UMUX (in that order), and some additional questions about the system’s and respondent’s backgrounds.

The survey was completed by 746 IBM employees, all of whom were residents of the United States.

What discoveries were made

Link

Results from the four questionnaires are fairly similar to each other, which suggests that they all measure roughly the same thing.

Reliability and validity

Link

Cronbach’s α values for all four questionnaires are above 0.70 (CSUQ: 0.97; SUS: 0.93; UMUX: 0.88; UMUX-LITE(r): 0.79). This suggests that they’re all reliable enough.

Correlations between CSUQ, UMUX and UMUX-LITE(r), and SUS are 0.76, 0.79, and 0.74 respectively: the questionnaires appear to be .

SUS and CSUQ

Link

To compare the CSUQ with the SUS, the CSUQ’s score is convered to the 100-point scale that’s used by SUS, such that CSUQ’s highest, lowest, and mean scores (1, 7, and 4 respectively) map to SUS’s 100, 0, and 50.

After conversion the overall mean CSUQ score is 66.7, which is 2 points lower than the mean SUS score (68.7).

The large sample size makes this a statistically significant difference. When the Sauro–Lewis curved grading scale is used, both scores map to a C, i.e. there’s no _practically+ significant difference.

Interestingly, scores from macOS users (CSUQ: 76.6 or B; SUS: 76.8 or B) are significantly higher than scores from Windows users (CSUQ: 64.1 or C-; SUS: 66.9 or C).

SUS and UMUX

Link

Results are fairly similar for UMUX and UMUX-LITE(r):

OS SUS UMUX UMUX-LITE UMUX-LITEr
macOS 76.8 (B) 79.2 (A-) 79.9 (A-) 74.9 (B)
Windows 66.9 (C) 66.5 (C) 68.5 (C) 67.4 (C)

Note that UMUX-LITEr’s scores are consistently close to those obtained using the SUS.

Summary

Link
  1. CSUQ, SUS, or UMUX are all alright, so those who already use one of them can keep doing that

  2. SUS is widely used in studies, which makes it a safe choice for those who currently do not use SUS, CSUQ or UMUX yet

  3. CSUQ is the best choice if you need a multidimensional instrument

  4. Use UMUX-LITEr if you need to ask as few questions as possible

  5. Be aware of bias: macOS users may be more positive about a system’s usability than Windows users