Measuring perceived usability: The CSUQ, SUS, and UMUX (2018)

Contest participants with scales of varying lengths
Three scales, one cup

The perceived usability of an application can be measured using short questionnaires. Most practitioners will be familiar with the SUS, but it’s far from the only one. Lewis compared two other questionnaires with acronym-y names, the CSUQ and UMUX, with SUS and discovered that they have a lot in common (which is a good thing).

Why it matters

Usability consists of three components. Two of these, efficiency and effectiveness, are objective and can be directly measured by means of observation. The third, perceived usability, is subjective and therefore not as easy to quantify.

Usability practitioners therefore often rely on questionnaires, which can be used to get an estimate of the perceived usability. The SUS and CSUQ are currently the most popular usability questionnaires. But there’s a new contender: UMUX!

That’s a lot of acronyms, so let’s have a brief look at each of them:

The study basically aims to investigate whether all these questionnaires measure the same thing. If that’s the case, we can compare results from different usability studies, even if different questionnaires were used!

How the study was conducted

The author created a survey that included the CSUQ, SUS, and UMUX (in that order), and some additional questions about the system’s and respondent’s backgrounds.

The survey was completed by 746 IBM employees, all of whom were residents of the United States.

What discoveries were made

Results from the four questionnaires are fairly similar to each other, which suggests that they all measure roughly the same thing.

Reliability and validity

Cronbach’s α values for all four questionnaires are above 0.70 (CSUQ: 0.97; SUS: 0.93; UMUX: 0.88; UMUX-LITE(r): 0.79). This suggests that they’re all reliable enough.

Correlations between CSUQ, UMUX and UMUX-LITE(r), and SUS are 0.76, 0.79, and 0.74 respectively: the questionnaires appear to be fairly aligned with each otherI’m glossing over the details here. Read the original article if you want to learn more about this..

SUS and CSUQ

To compare the CSUQ with the SUS, the CSUQ’s score is convered to the 100-point scale that’s used by SUS, such that CSUQ’s highest, lowest, and mean scores (1, 7, and 4 respectively) map to SUS’s 100, 0, and 50.

After conversion the overall mean CSUQ score is 66.7, which is 2 points lower than the mean SUS score (68.7).

The large sample size makes this a statistically significant difference. When the Sauro–Lewis curved grading scale is used, both scores map to a C, i.e. there’s no practically significant difference.

Interestingly, scores from macOS users (CSUQ: 76.6 or B; SUS: 76.8 or B) are significantly higher than scores from Windows users (CSUQ: 64.1 or C-; SUS: 66.9 or C).

SUS and UMUX

Results are fairly similar for UMUX and UMUX-LITE(r):

OS SUS UMUX UMUX-LITE UMUX-LITEr
macOS 76.8 (B) 79.2 (A-) 79.9 (A-) 74.9 (B)
Windows 66.9 (C) 66.5 (C) 68.5 (C) 67.4 (C)

Note that UMUX-LITEr’s scores are consistently close to those obtained using the SUS.

The important bits

  1. CSUQ, SUS, or UMUX are all alright, so those who already use one of them can keep doing that
  2. SUS is widely used in studies, which makes it a safe choice for those who currently do not use SUS, CSUQ or UMUX yet
  3. CSUQ is the best choice if you need a multidimensional instrument
  4. Use UMUX-LITEr if you need to ask as few question as possible
  5. Be aware of bias: macOS users may be more positive about a system’s usability than Windows users