Predicting estimated time of arrival for commercial flights

Published: 11 Nov 2018
Written by: Chun Fei Lung

Achieving better ETA prediction using machine learning.

Machine learning can show you things that are hidden in plane sight

Arrival times for commercial flights are currently estimated using deterministic models that fail to account for many variables that affect flight time. Ayhan, Costas, and Samet created a model that leverages these traditionally overlooked variables to make more accurate arrival time predictions.

About the article

Title	Predicting estimated time of arrival for commercial flight
Year	2018
Author(s)	Samet Ayhan (University of Maryland) Pablo Costas (Boeing Research & Technology Europe) Hanan Samet (University of Maryland)
Venue	Proceedings of the 24th International Conference on Knowledge Discovery & Data Mining

Why it matters

The estimated time of arrival (ETA) tells you when a flight will probably land at an airport.

That information is useful for passengers and those who pick them up, but possibly even more so for airlines: knowing the ETA of a flight well in advance allows an airline to coordinate personnel and equipment at airports, which minimises turnaround times and lowers costs.

Traditional ETA prediction methods typically use a deterministic approach that only takes the flight trajectory and possibly some of the aircraft’s characteristics into account. However, actual time of arrival is also affected by factors like wind, temperature, and congestion, so a prediction that ignores these variables is not likely to be very accurate.

How the study was conducted

The authors aimed to create a system that accurately predicts the ETA of a commercial flight (side note: It’s important to note that usually there’s more than one ETA. Passengers will typically only care about the gate arrival time. In this study, the authors only look at the runway arrival time (i.e., when the wheels touch the ground), since they do not take taxiways and aprons into account.) before it even departs.

The system makes use of several types of data:

The airline, flight number, and type of aircraft;
The approximate route that the aircraft will take;
Over 40 meteorological attributes, like temperature, wind speed and direction, humidity, and air pressure;
The number of flights at the departure and arrival airports;
The number of aircraft in a sector along the flight path.

Multiple models were constructed and ranked based on their predictive performance. Two of the models make use of boosting methods:

Adaptive boosting is a meta-algorithm that generates predictions by iteratively weighing and combining outputs of several learning algorithms;
Gradient boosting is another meta-algorithm that generates predictions iteratively, but more in a “stacked” way.

The system’s performance was evaluated with 10 major flight routes in Spain, using 11 different machine learning algorithms and one algorithm based on averages of historical flight times for the same route and period.

What discoveries were made

The resulting system achieves a higher accuracy than EUROCONTROL’s ETA prediction system.

Algorithms

The table below shows the root mean squared error (RMSE), averaged over all routes for each algorithm. The boosting methods appear to work best, and more consistently at that.

Method	Algorithm	Average RMSE	Standard deviation
Traditional	Historical average	4.454057	0.910959
Linear	Linear regression	5.224831	0.840323
	Lasso regression	4.204375	0.858625
	Elastic net regression	4.153771	0.805265
Non-linear	Classification and regression trees	4.660715	0.453537
	Support vector regression	3.886390	0.733385
	k-nearest neighbours	3.643647	0.751386
Ensemble	Adaptive boosting 🥈	3.364734	0.531285
	Gradient boosting 🥇	3.346209	0.461617
	Random forest regression	3.498223	0.512965
	Extra trees regression 🥉	3.491921	0.503401
Recurrent neural network	Long short-term memory	4.298340	1.438574

More detailed results are available in the original article.

Features

The table below lists the relative importance of the top 10 features. It’s clear that meteorological data are invaluable if you want to make accurate ETA predictions.

Rank	Feature	Score
1	Arrival airport	1.0
2	Atmospheric pressure	0.67854
3	Atmospheric wind speed	0.66231
4	Atmospheric wind direction	0.65224
5	Atmospheric humidity	0.63331
6	Atmospheric temperature	0.61314
7	Airport congestion rate	0.53212
8	Sector congestion rate	0.31153
9	Flight number	0.29192
10	Aircraft type	0.13221

Summary

The highest ETA prediction accuracy can be achieved using gradient boosting and adaptive boosting
Meteorological data and congestion rates are important predictors for the ETA of a commercial flight