Measuring perceived usability: The CSUQ, SUS, and UMUX (2018)
The perceived usability of an application can be measured using short questionnaires. Most practitioners will be familiar with the SUS, but it’s far from the only one. Lewis compared two other questionnaires with acronym-y names, the CSUQ and UMUX, with SUS and discovered that they have a lot in common (which is a good thing).
Why it matters
Usability consists of three components. Two of these, efficiency and effectiveness, are objective and can be directly measured by means of observation. The third, perceived usability, is subjective and therefore not as easy to quantify.
Usability practitioners therefore often rely on questionnaires, which can be used to get an estimate of the perceived usability. The SUS and CSUQ are currently the most popular usability questionnaires. But there’s a new contender: UMUX!
That’s a lot of acronyms, so let’s have a brief look at each of them:
The Computer System Usability Questionnaire (CSUQ) consists of 16 questions that can be answered on a 7-point Likert scale. The answers can be used to compute not just one, but four different scores between 1 and 7: an overall score, and scores for the system’s usefulness, information quality, and interface quality.
The System Usability Scale (SUS) is by far the most popular questionnaire. It consists of only 10 questions, that must be answered on a 5-point Likert scale. Answers can be used to compute a score between 0 and 100.
Because SUS usage is so widespread, practitioners and researchers can easily compare any system’s perceived usability with that of others, either by directly comparing the computed score or by mapping scores to grades (ranging from F to A+).
The Usability Metric for User Experience (UMUX) consists of only 4 questions, which must be answered on a 7-point Likert scale in order to obtain a score between 0 and 100.
UMUX-LITE is a variant of UMUX that consists of just two questions“But wait, there’s more! … Or less.”. Confusingly, there are actually two variants of this light variant. The first results in a score between 0 and 100, but appears to be slightly biased. The second variant, UMUX-LITEr, results in scores that are closer to scores obtained using SUS, but can only take values between 22.9 and 87.9.
The study basically aims to investigate whether all these questionnaires measure the same thing. If that’s the case, we can compare results from different usability studies, even if different questionnaires were used!
How the study was conducted
The author created a survey that included the CSUQ, SUS, and UMUX (in that order), and some additional questions about the system’s and respondent’s backgrounds.
The survey was completed by 746 IBM employees, all of whom were residents of the United States.
What discoveries were made
Results from the four questionnaires are fairly similar to each other, which suggests that they all measure roughly the same thing.
Reliability and validity
Cronbach’s α values for all four questionnaires are above 0.70 (CSUQ: 0.97; SUS: 0.93; UMUX: 0.88; UMUX-LITE(r): 0.79). This suggests that they’re all reliable enough.
Correlations between CSUQ, UMUX and UMUX-LITE(r), and SUS are 0.76, 0.79, and 0.74 respectively: the questionnaires appear to be fairly aligned with each otherI’m glossing over the details here. Read the original article if you want to learn more about this..
SUS and CSUQ
To compare the CSUQ with the SUS, the CSUQ’s score is convered to the 100-point scale that’s used by SUS, such that CSUQ’s highest, lowest, and mean scores (1, 7, and 4 respectively) map to SUS’s 100, 0, and 50.
After conversion the overall mean CSUQ score is 66.7, which is 2 points lower than the mean SUS score (68.7).
The large sample size makes this a statistically significant difference. When the Sauro–Lewis curved grading scale is used, both scores map to a C, i.e. there’s no practically significant difference.
Interestingly, scores from macOS users (CSUQ: 76.6 or B; SUS: 76.8 or B) are significantly higher than scores from Windows users (CSUQ: 64.1 or C-; SUS: 66.9 or C).
SUS and UMUX
Results are fairly similar for UMUX and UMUX-LITE(r):
|macOS||76.8 (B)||79.2 (A-)||79.9 (A-)||74.9 (B)|
|Windows||66.9 (C)||66.5 (C)||68.5 (C)||67.4 (C)|
Note that UMUX-LITEr’s scores are consistently close to those obtained using the SUS.