How to use the System Usability Scale (SUS) in 2021
There are several standardised questionnaires for the assessment of perceived usability. The System Usability Scale (SUS), introduced in 1996 by Brooke, is arguably the most popular questionnaire and is thought to be used in close to half of all industrial usability studies.
The SUS was not the first standardised usability questionnaire, but it had two major benefits over its contemporary competitors: it’s simple and free (as in beer). The only requirement is that its source is acknowledged in published usability reports.
In 2021 the standard version of the SUS looks like this:
Strongly disagree | Strongly agree | |||||
---|---|---|---|---|---|---|
# | 1 | 2 | 3 | 4 | 5 | |
1 | I think that I would like to use this system frequently. | ❍ | ❍ | ❍ | ❍ | ❍ |
2 | I found the system unnecessarily complex. | ❍ | ❍ | ❍ | ❍ | ❍ |
3 | I thought the system was easy to use. | ❍ | ❍ | ❍ | ❍ | ❍ |
4 | I think that I would need the support of a technical person to be able to use this system. | ❍ | ❍ | ❍ | ❍ | ❍ |
5 | I found the various functions in this system were well integrated. | ❍ | ❍ | ❍ | ❍ | ❍ |
6 | I thought there was too much inconsistency in this system. | ❍ | ❍ | ❍ | ❍ | ❍ |
7 | I would imagine that most people would learn to use this system very quickly. | ❍ | ❍ | ❍ | ❍ | ❍ |
8 | I found the system very to use. | ❍ | ❍ | ❍ | ❍ | ❍ |
9 | I felt very confident using the system. | ❍ | ❍ | ❍ | ❍ | ❍ |
10 | I needed to learn a lot of things before I could get going with this system. | ❍ | ❍ | ❍ | ❍ | ❍ |
The SUS is typically administered to participants right after they complete a task-based usability test. It consists of ten statements about the tested system, for which a participant can indicate their level of agreement on a five-point Likert scale.
Scoring is somewhat complicated due to the alternating tone of the items:
-
Convert Likert responses to a numeric scale from 1 to 5, as shown above. Unanswered items should be given a score of 3 (the center of the five-point scale).
-
Subtract 1 from the raw score for odd-numbered items. Subtract the raw score from 5 for even-numbered items.
-
Calculate the total sum of the adjusted scores, then multiply by 2.5 to get the standard SUS score.
Alternatively the following equation can be used:
SUS = 2.5(20 + SUM(SUS01,SUS03,SUS05,SUS07,SUS09) − SUM(SUS02,SUS04,SUS06,SUS08,SUS10))
Either way, the result is a score between 0 and 100. Higher scores are better.
Scores obtained for two different systems can be compared against each other, but widespread usage of the SUS has also allowed for the development of norms for what scores should be seen as acceptable.
The Sauro–Lewis curved grading scale (CGS) below provides mappings from SUS scores to letter grades:
SUS score range | Grade | Percentile range |
---|---|---|
84.1–100 | A+ | 96–100 |
80.8–84.0 | A | 90–95 |
78.9–80.7 | A− | 85–89 |
77.2–78.8 | B+ | 80–84 |
74.1–77.1 | B | 70–79 |
72.6–74.0 | B− | 65–69 |
71.1–72.5 | C+ | 60–64 |
65.0–71.0 | C | 41–59 |
62.7–64.9 | C− | 35–40 |
51.7–62.6 | D | 15–34 |
0.0–51.6 | F | 0–14 |
Most projects should aim for a SUS score of at least 80. However, there are a few potential gotchas that you should be aware of:
-
Different types of products and interfaces may differ significantly in perceived usability. For example, a score of 80 is unrealistically high when one is developing a complex spreadsheet application, but unrealistically low when one is developing a new search interface.
-
When conducting a within-subjects study, respondents may give lower ratings to harder products and higher ratings to easier products than they otherwise might.
-
The SUS is a reliable and valid instrument for usability, but it is also sensitive to the amount of experience that users have with the system that they rated. You should therefore track and control for differences in the amount of experience.
-
Certain personality traits, like Openness to Experience and Agreeableness may affect SUS scores.
The SUS is designed to be unidimensional, i.e. it measures one thing and one thing only. For a while it seemed that the SUS might actually measure two factors: learnability (items 4 and 10) and usability (the remaining items). However, recent studies have not been able to confirm these assumptions, which puts them on shaky ground.
The SUS is flexible. have little effect on its accuracy and in recent years the SUS has also been used for and procurement of products for the US military.
Here we discuss three larger changes that are often made to the SUS itself.
The original SUS uses items with an alternating to control for acquiescence bias. This also makes it easier for researchers to identify respondents who did not pay attention to the statements that they rated.
However, there is evidence that including a mix of positively and negatively worded items creates more problems than it solves. It makes it more likely that respondents make mistakes while completing the questionnaire and that responses are miscoded when computing overall scores.
Studies with a positive version of the SUS suggest that its results are not significantly different from those of the original version:
Strongly disagree | Strongly agree | |||||
---|---|---|---|---|---|---|
# | 1 | 2 | 3 | 4 | 5 | |
1 | I think that I would like to use this system frequently. | ❍ | ❍ | ❍ | ❍ | ❍ |
2 | I found the system to be simple. | ❍ | ❍ | ❍ | ❍ | ❍ |
3 | I thought the system was easy to use. | ❍ | ❍ | ❍ | ❍ | ❍ |
4 | I think that I could use the system without the support of a technical person. | ❍ | ❍ | ❍ | ❍ | ❍ |
5 | I found the various functions in this system were well integrated. | ❍ | ❍ | ❍ | ❍ | ❍ |
6 | I thought there was a lot of consistency in the system. | ❍ | ❍ | ❍ | ❍ | ❍ |
7 | I would imagine that most people would learn to use this system very quickly. | ❍ | ❍ | ❍ | ❍ | ❍ |
8 | I found the system very intuitive. | ❍ | ❍ | ❍ | ❍ | ❍ |
9 | I felt very confident using the system. | ❍ | ❍ | ❍ | ❍ | ❍ |
10 | I could use the system without having to learn anything new. | ❍ | ❍ | ❍ | ❍ | ❍ |
Respondents are supposed to provide rating for each item. But some items may be confusing or distracting in some contexts. For instance, the first item does not make a lot of sense for systems that will only be used infrequently, like systems for registering complaints.
A study by Lewis and Sauro shows that nine-item variants are basically just as good as ten-item versions, as long as the calculation of the SUS score is adjusted accordingly by multiplying the total sum by 100/36 rather than 100/40 (2.5).
A number of translations of the SUS have been published, including Arabic, Slovene, Polish, Italian, Persian, and Portuguese. The average reliability for translated versions is a bit lower than those for the English version, but the resulting scores are still reasonably close to the scores that would have been obtained using the original SUS.
-
The System Usability Scale (SUS) is a simple, battle-tested instrument for measuring the usability of a product or system
-
Variations of the SUS that are exclusively positively worded or only contain 9 items work at least just as well