The Toilet Paper

Measuring programming experience

Studies about programming often need to control for programming experience. What’s the best way to do that?

A whizz-kid, a sysadmin, a fresh graduate, and an idiot hold up number signs
I think this is pretty grade

Software engineering studies often involve the measurement of variables. For example, a study could measure the effect of test-driven development on code quality and productivity.

Such studies typically also need to deal with so-called confounding variables, like programming experience, which can affect the outcome of surveys or experiments. If a study does not take the influence of confounding variables into account, its results could be severely biased.

Why it matters

Many software engineering studies involve human participants, so one would think that researchers always control for confounding variables like programming experience in their studies.

In reality this doesn’t always happen and when it does happen it is not always clear how it is done. Researchers also use different methods to control for programming experience. This makes it harder to interpret and compare results from such studies.

The goal of this study is therefore to evaluate how reliable different ways to measure programming experience are, so that we all know how we should do it from now on.

How the study was conducted

The first step towards that goal involves a systematic literature review on how researchers measure programming experience.

Based on the results of this review, the authors create a questionnaire that is designed to evaluate the efficacy of those measurements. This questionnaire consists of two parts:

  • that existing studies have used to measure programming experience;
  • Java programming tasks that participants need to complete.

It is assumed that participants with a higher amount of programming experience will solve more tasks correctly and will be able to complete more tasks within the given time.

The purpose of the study was only disclosed to participants after conclusion of the experiment.

What discoveries were made

The literature review yields ways of managing programming experience:

  1. programming experience in number of years;
  2. level of education;
  3. self-estimation by participants, e.g. on a five-point scale;
  4. some unspecified questionnaire;
  5. the size of the (largest) programs that participants have written;
  6. performance on unspecified pre-tests that are conducted prior to the actual study;
  7. estimation by participants’ supervisors;
  8. experience is measured, but it is not specified how this happens;
  9. experience is not controlled for at all.


Based on these findings, the authors create the following questions for their questionnaire:

Self-estimationOn a scale from 1 to 10, how do you estimate your programming experience?1: very inexperienced to 10: very experienceds.PE
How do you estimate your programming experience compared to experts with 20 years of practical experience?1: very inexperienced to 5: very experienceds.Experts
How do you estimate your programming experience compared to your class mates?1: very inexperienced to 5: very experienceds.ClassMates
How experienced are you with the following languages: Java/C/Haskell/Prolog1: very inexperienced to 5: very experienceds.Java / s.C / s.Haskell / s.Prolog
How many additional languages do you know (medium experience or better)?Integers.NumLanguages
How experienced are you with the following programming paradigms: functional/imperative/logical/object-oriented programming?1: very inexperienced to 5: very experienceds.Functional / s.Imperative / s.Logical / s.ObjectOriented
YearsFor how many years have you been programming?Integery.Prog
For how many years have you been programming for larger software projects, e.g. in a company?Integery.ProgProf
EducationWhat year did you enroll at university?Integere.Years
How many courses did you take in which you had to implement source code?Integere.Courses
SizeHow large were the professional projects typically?NA, <900, 900-40000, >40000z.Size
OtherHow old are you?Integero.Age

Most questions should speak for themselves, although a few might be a bit unclear without context:

  • The languages Java, C, Haskell and Prolog were chosen ;

  • s.NumLanguages should only include languages for which one has at least “medium experience”, but the paper never explains what that means;

  • e.Years requires a conversion from a year (e.g. 2019) to the number of years in which a participant was enrolled (e.g. 2);

  • z.Size refers to the number of lines of code.


The authors find small to strong Spearman rank correlations for about half of the questions. These are listed in the table below (ρ), along with . Bold values denote significant correlations (p < .05).


The authors further explore their data using stepwise regression and exploratory factor analysis.

Questions that matter most

To determine which questions are the best indicators of programming experience, one can start by looking at questions with at least a medium correlation with the number of correctly solved tasks. However, questions can also be correlated with each other. If we do not take this into account, we would overestimate the importance of these questions.

This is where stepwise regression comes in, which helps you create the smallest model that can explain the number of correctly solved tasks.

The authors conclude that s.Logical and s.ClassMates can be used to explain 24.1% of the variance in the number of correct answers and are thus the most important questions.


One might wonder why s.Logical is more important than s.Java, which is the language used for the programming tasks. A possible explanation is that Java is a “beginner language” that all participants know relatively well, while only those who actively pursue logical programming will be very familiar with logical programming languages.

Grouping questions into factors

Exploratory factor analysis can be used to reduce the number of variables to a smaller number of underlying latent variables or factors, which are easier to reason about. It works by identifying groups of variables that correlate with each other.

The authors extracted five factors that affect programming experience:

  1. experience with mainstream languages (s.C, s.ObjectOriented, s.Imperative, s.Experts, and s.Java);

  2. professional experience (y.ProgProf, z.Size, s.NumLanguages, s.ClassMates);

  3. functional experience (s.Functional, s.Haskell);

  4. experience from education (e.Courses, e.Years, y.Prog); and

  5. logical experience (s.Logical, s.Prolog).


The authors make the following recommendations in their paper.

  1. Report precisely which measures you use to control for programming experience

  2. Use self-estimation questions to judge programming experience among undergraduate students

  3. Combine multiple questions whenever possible. Some may serve as control questions to see whether subjects answered honestly

  4. Use s.ClassMates and s.Logical, as these were most capable of predicting the number of correct tasks in the authors’ experiments

  5. The identified factors for programming experience are useful for developing a theory on programming experience

Related articles