Chuniversiteit logomarkChuniversiteit.nl
The Toilet Paper

Efficient and green LLMs for software engineering

Training and using large language models to develop software is bad for the planet – but it doesn’t have to be that way.

Citi Zēni – Eat Your Salad
Being green is hot / Being green is cool / Eat your salad, save the planet / Being green is sexy as fuck

Large language models (LLMs) can help software engineers with common tasks such as writing and summarising code, and finding and repairing bugs. However, LLMs are computationally intensive and energy-demanding, so training and running them usually requires deep pockets. Unless we find ways to drastically reduce their computational costs and energy use, this is unlikely to improve.

Techniques to make large language models for software engineering more green and efficient can be categorised from four perspectives: data-centric, model-centric, system-centric, and program-centric.

Data-centric techniques reduce or optimise the data required to train LLMs:

Model-centric techniques optimise the LLMs themselves. There are three main approaches here:

System-centric techniques optimise parts of the system or pipeline, such as the inference process or decoding strategy:

Program-centric techniques optimise the input programs that are fed into LLMs:

The main audience for this paper is the research community working on large language models for software engineering. Many of its proposals for the future – more efficient training, improved inference acceleration, and program optimisation – will be of limited immediate use to practising software engineers.

There is, however, one additional technique worth calling out for readers who may not be familiar with it: retrieval-augmented generation (RAG). RAG retrieves texts that semantically match a query from an external knowledge base and passes them to an LLM, which then generates an appropriate answer. This allows LLMs to generate factually accurate responses without the need for extensive retraining.

RAG is of course more of a . It adds latency and operational cost compared with simple prompting, but can be a practical way to teach an LLM new facts when you lack the resources or expertise to retrain models.

Summary

Link
  1. LLMs can be made more efficient by reducing the amount of data or processing that they have to do