Rethinking thinking aloud: A comparison of three think-aloud protocols
Think-aloud is one of the most popular methods used to evaluate usability of websites and other types of systems. Various think-aloud methods exist. Alhadreti and Mayhew compared three of them – concurrent, retrospective, and hybrid think-aloud — and found that one clearly outperforms the other two.
The general idea of think-aloud is that participants verbalise all their thoughts: what they see, what actions they want to perform and why, how they feel about something, and so on. By letting participants do this while they carry out a set of predefined tasks on a system, a usability researcher can discover which issues users might face when using the system and why.
Each of described in the paper has its own advantages and disadvantages:
Concurrent think-aloud requires participants to verbalise their thoughts while they are carrying out tasks using the evaluated system. This means you get feedback in real-time. Unfortunately thinking aloud might influence the way participants carry out their tasks, and hence the issues that you find using this method might not be entirely representative.
Retrospective think-aloud requires participants to verbalise their thoughts only after they have completed their tasks. The participants’ thought processes while executing are therefore more likely to be real. The flipside is that post hoc verbalisation might not accurately reflect participants’ original thought processes.
Hybrid think-aloud combines the two methods within the same test. By asking participants to verbalise their thoughts both during and after carrying out the tasks, one might gain additional insights into the things that caused issues.
The study compares the pros and cons of these methods.
For each of the think-aloud methods, a usability study was conducted on a university library website, with tasks of medium difficulty and 20 participants per group. Participants’ characteristics were kept to minimise impact of individual differences, while tasks were entirely identical between groups.
Several variables were measured to determine the effectiveness and efficiency of each think-aloud method:
task performance, i.e. the number of tasks completed successfully;
“side-effects” due to thinking aloud and the presence of an observer;
the number of observed and/or verbalised issues;
the time needed to conduct each study.
Thinking aloud does not seem to affect how well participants are able to complete tasks, how they perceive the system’s usability, or how they feel during the evaluation.
However, when the number of identified usability issues are compared, it’s clear that retrospective think-aloud is much less effective than the other two methods: when it’s used, the number of identified issues is significantly lower; presumably because participants have forgotten about some of the issues they’ve encountered by the time they’re asked questions about them.
Many of the overlooked issues are minor and often related to layout, e.g. things that aren’t as clear or consistent as they should be. Unfortunately, retrospective think-aloud has few other redeeming qualities. To make matters worse, it’s also about 60% more expensive than concurrent think-aloud.
This is also where hybrid think-aloud falls flat on its face: it detects than concurrent think-aloud, but is also much more expensive (about 70%).
Concurrent think-aloud is the most cost-effective think-aloud method
Retrospective think-aloud is generally best avoided