Flat Earth

Spicy papers with a grain of salt, tl;dr edition

I spend a lot of time reading papers. I spend even more time summarising them. Now I made oversimplified summaries of those summaries.

A red pepper and a salt shaker dancing above a pizza
Would you like some paper-oni with that?

The Toilet Paper is the largest and most frequently visited section on my website. It’s where I publish summaries of mostly computer science and software engineering research papers.

I do my best to keep my summaries short enough for toilet and lunch breaks, but for some people (like my dearest colleagues) even these summaries contain too many words. If you are one of those people, this page might be for you! I’ve summarised each of the summaries from the last year’s Toilet Paper series. This may have resulted in the loss of “some” nuance here and there, but if you normally get your science news from Reddit you’re probably already used to that.

By the way, if you were wondering why I haven’t published any new summaries in a while; it’s because I also have other things to read. 🙃 But I’ve got some good news, because I’m starting a new series this month!

Software development


Can the understandability of code be measured?

Maybe? This paper describes the cognitive complexity measure, which reportedly can show how hard it is to understand a piece of code.

🧂 The grain of salt

We actually don’t really know how any of this really works. The paper mentions that the cognitive complexity measure is an improvement over (the ancient) cyclomatic complexity measure, but curiously enough doesn’t say a lot about other newer complexity measures.

Does refactoring make code easier to understand?

Not necessarily. Clean code can take more time to understand than “dirty” code that can be read in a linear fashion. But somehow the overall conclusion still is that you should refactor your code.

🧂 The grain of salt

This feels a little bit biased (even if the conclusion is probably correct).

How should agile teams document requirements?

Requirements should be documented in user stories, which should contain information about interactions, non-functional requirements, and technical constraints.

🧂 The grain of salt

These conclusions are partially based on interviews and surveys with a ✨ tiny sample size ✨.

Should you upgrade official Docker Hub images in production environments?

No, test them first to make sure everything still works.

🧂 The grain of salt

A bit handwavy. Also, “Docker image upgrades don’t kill production environments, people (who don’t test their stuff) kill production environments”.

Can mutation testing be done more efficiently and effectively?

Sort of. You could start by:

  • only generating mutants that have the potential to introduce bugs;
  • only considering syntactically valid mutants; and
  • making the test results actionable for developers.

🧂 The grain of salt

This study was conducted at a FAANG. Most developers do not work at a FAANG.

Does code become more readable when it’s split into multiple parts?

Not necessarily, it depends on the code that’s being refactored. It might become more readable if the intermediate variables for the results of each part are given meaningful names.

🧂 The grain of salt

These conclusions are based on a study with ✨ students ✨ and very math-y code, which most code is not.

Do Java developers make better Python developers than actual Python developers?

Java and C++ developers might not always be aware of Pythonic conventions, but the best practices for their own language often also apply to Python. This sometimes results in Python code that is cleaner than code that would normally be written by Python developers.

🧂 The grain of salt

These conclusions are based on a study that only looked at public GitHub repositories.

How do developers tweet about GitHub projects?

Project owners and happy users tend to promote repositories, while maintainers tend to focus on interaction with the community.

🧂 The grain of salt

Why did I think it was a good idea to pick this paper?

What types of bugs are common in regular expressions?

Often there’s a bug in the regular expression itself. In other cases the regular expression itself is technically fine, but used in an incorrect way.

🧂 The grain of salt

The paper contains some pretty good advice. There isn’t much else to say about this paper. 🤷‍♂️

How good are static analysis tools at measuring technical debt?

They’re not.

🧂 The grain of salt

This study assumes that technical debt only means that files have to be modified more often or are associated with bugs more often.

Should you avoid using single-letter variable names?

It probably doesn’t matter, as long as they are used within small scopes and follow certain conventions (e.g. i is always used for loop indices).

🧂 The grain of salt

Further research is needed.

What is the best programming language for beginners?

It depends, but in most cases Java is a clear winner. Python is a good alternative for prospective researchers, while C++ is a good choice for electrical engineers.

🧂 The grain of salt

I think someone forgot to edit this paper.

Why is camelCase better than snake_case?

CamelCased strings take more time to read and are therefore more likely to be read correctly.

🧂 The grain of salt

This paper has a rather unconventional structure. That’s not necessarily a bad sign, but…

Why is snake_case better than camelCase?

Forget what I said above, the difference between the two is probably negligible. In fact, long camelCased strings are likely easy to misread.

🧂 The grain of salt

This study (and the previous one) only looks at identifiers and does not take their context into account.

At work


How should you onboard developers during a pandemic?

  1. Promote communication and asking for help
  2. Encourage teams to turn cameras on
  3. Schedule 1:1 meetings
  4. Provide information about the organisation
  5. Emphasise team building
  6. Assign an onboarding buddy
  7. Assign an onboarding technical mentor
  8. Support multiple onboarding speeds
  9. Assign a simple first task
  10. Provide up-to-date documentation

🧂 The grain of salt

These recommendations aren’t backed by empirical evidence yet – they’re based on struggles that were often reported in a survey for new hires.

Do men and women express their emotions differently in pull requests?

Male developers express more and stronger sentiments in code reviews, and are more likely to act like dicks than female developers.

🧂 The grain of salt

These conclusions are based on a ✨ tiny sample size ✨ of six open source projects, using automated tools which may or may not be accurate.

Do codes of ethics work?

No, they don’t.

🧂 The grain of salt

This conclusion is based on a multiple-choice questionnaire. Real-life decisions are not made using a multiple-choice questionnaire.

How does one coordinate multiple agile teams?

Informal forms of communication work better than formal meetings.

🧂 The grain of salt

These conclusions are based on a ✨ case study ✨.

Should you learn trendy technology to land that next job (interview)?

Companies often claim that they want applicants that know trendy technologies, but in reality all they want is someone who can do the job.

🧂 The grain of salt

You’d better know some trendy technologies if you want to work at a Web 3.0 blockchain crypto planet-destroying startup that makes you rich.

How can organisations transition to DevOps?

DevOps adoption requires a culture change, for which you probably need people with DevOps experience and managerial support. Everything else is nice to have.

🧂 The grain of salt

These conclusions are based on a ✨ case study ✨.

How well does burnout prevention work?

It works very well.

🧂 The grain of salt

These conclusions are based on a study that was conducted at a single company.

Why are Zoom meetings so tiring?

It’s harder to process social cues in virtual meetings and it also doesn’t help that you constantly see an image of yourself. You might want to keep your camera turned off.

🧂 The grain of salt

I can’t wait for someone to do a Metaverse version of this study.

How can the quality of workplace discussions be improved?

Some people talk a lot, some people talk less. Sometimes that’s good, sometimes that’s bad. Stimulate the right types of talking and silence, and suppress the unproductive types.

🧂 The grain of salt

Easier said than done.

How much time should you spend on work?

Probably less than you currently do.

🧂 The grain of salt

Ideally you should earn just enough money to do all the things that you want to do. Everything beyond that is basically wasted effort.

How does a Scrum master help their teams?

By implementing (more) Scrum practices and taking on a leadership role, Scrum masters can help teams work more effectively.

🧂 The grain of salt

This was by far the most underwhelming paper from this series.

Research and education


How do you measure programming experience in a survey?

People who claim that they have experience with logical programming languages or are more experienced than their direct peers probably really are experienced in computer programming. Several standardised questions about mainstream languages can be combined to get a more balanced understanding of someone’s programming experience.

🧂 The grain of salt

These conclusions are based on a study with ✨ undergraduate students ✨.

Why are things often “beyond the scope” of the paper?

The phrase “… is beyond the scope of this paper” is typically used to remind readers what the study is and isn’t about, and also to show why more research (presumably by the same authors) is necessary.

🧂 The grain of salt

This is not a very rigorous study.

What should your Likert scales look like?

Use vertical lists for absolute judgments and horizontal lists for relative judgments.

🧂 The grain of salt

This assumes that you have enough space to display all the options.

How many people do you need for a reliable A/B test?

A lot. Even the smallest sub-population that’s served by your software should be sufficiently represented in the sample.

🧂 The grain of salt

When A/B testing there’s usually no way to tell whether someone is a member of a particular sub-population.

Does the Semantic Web still exist?

The dream is dead, but the Web lives on… within the confines of megacorporations like Google and Meta.

🧂 The grain of salt

This isn’t a research paper.

How do computer science students work on group projects?

Most of the time very little collaboration takes place and the work is simply divided. However, there is usually some form of coordination, which is typically done by a leader of some sorts.

🧂 The grain of salt

These conclusions are based on qualitative research.

Who are behind those essay ghostwriting services?

Mostly Kenyans, Indians and Pakistanis. .

🧂 The grain of salt

This research is based on data for a single day for one specific platform.

Why don’t girls like STEM?

Primarily an overabundance of male role models and a lack of female role models.

🧂 The grain of salt

This study was conducted in the United States. You’re probably not from the United States.

Do financial incentives for surveys work?

Yes, they do. The more you pay, the more (good) respondents you get.

🧂 The grain of salt

These conclusions are based on a study with ✨ students ✨.

What should you keep in mind when conducting a controlled experiment for code comprehension?

Controlled experiments for program comprehension are affected by the code, tasks, metrics, and experimental subjects that are chosen.

🧂 The grain of salt

I got nothing, this is a pretty useful manual.

Machine learning


How can MTurk be used for research purposes?

Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace where one can find people who are willing to complete small tasks that are hard or impossible to automate for not a lot of money. It works pretty well, as long as you only hand out the tasks to workers with a good reputation, explicitly filter out workers with undesirable characteristics, and verifying that workers take your task seriously.

🧂 The grain of salt

It kind of feels like this study was primarily conducted to justify the use of MTurk?

Things to consider when you let people annotate things

  1. Different people may interpret examples differently

  2. Disagreement in annotations is a sign that an example is too vague

  3. Annotation guidelines should be kept as simple as possible

  4. Examples should be annotated by multiple people

  5. Experts do not make better annotations than non-experts

  6. Examples that are clear should be given a higher weight than examples that are vague

  7. New data should be collected continuously to capture changes in interpretation

🧂 The grain of salt

I couldn’t find any, which probably has nothing to do with the fact that one of the authors of this paper was my thesis supervisor.

What should you use for named entity recognition (NER)?

You probably should use Stanford CoreNLP.

🧂 The grain of salt

You may also want to consider things like the performance for your specific problem domain, and development, maintenance and hosting costs.

What is the best way to predict text readability?

There are many different formulas and virtually all of them basically measure the same thing in roughly the same way. All are fine, but do keep in mind that some formulas are meant for texts of fairly specific lengths.

🧂 The grain of salt

In just a few years this paper will be old enough to get its driving license.

How do you use the VIF to get rid of multicollinearity?

Incorrectly. You’re probably doing it incorrectly. The VIF is only one of many indicators about a model, so any decision you make should be based on many different indicators.

🧂 The grain of salt

Having said that, it’s all okay as long as you can justify why you do something in a particular way.

Why does ensemble learning work?

It’s basically wisdom of the crowds. Every model can be right and wrong in different ways. The right aspects support each other, while the wrong ones cancel each other out.

🧂 The grain of salt

Ensemble learning can be kind of expensive and may not always work.

User experience design


How should data be visualised?

Most readers prefer simple data visualisations that are easy to understand (like bar charts) over creative visualisations. Designers are more fond of creative, non-standard forms of visualisations, but it’s probably best to ignore their opinions.

🧂 The grain of salt

This research is purely about general preferences and does not take context into account. It probably matters where and why a visualisation is used.

How should the SUS be used nowadays?

The original version of the System Usability Scale (SUS) still works well, but can be improved by wording every item positively and (optionally) removing the least relevant item.

🧂 The grain of salt

Apparently you should also take the experience level of participants into account, but the paper does not tell you how.

Classical antiquity


How hygienic were toilets in the Roman Empire?

They were not.

🧂 The grain of salt

Maybe the world was created only last Thursday and none of this really matters.

How important were centurions in the Roman army?

A good centurion could use his exceptional leadership qualities to limit the number of casualties on the Roman side, but only very good centurions would really be able to make a difference.

🧂 The grain of salt

These conclusions are based on simulations which are based on assumptions.

Did lead poisoning affect the health of citizens in the Roman Empire?

Yes, they died earlier.

🧂 The grain of salt

This conclusion is based on analysis of human remains, but it’s impossible to identify the cause of death with 100% certainty.