When machine learning deciphers the ‘language’ of atmospheric air masses

A collaboration between LSCE (CEA-CNRS-UVSQ), LISN and SPEC scientists converts for the first time a machine learning technique used in linguistics to daily weather maps covering 70 years. The way is now open for climatological analyses that are beyond the reach of human experts!

How to ‘read’ the evolution of the climate in the history of daily weather reports?

To meet this challenge, atmospheric scientists use algorithms (Empirical Orthogonal Functions or k-means) to reduce the complexity of weather maps of pressure, temperature or precipitation. This gives them a small number of basic elements that can however be difficult to interpret, or that combine objects that are interrelated and therefore impossible to study separately, such as cyclones and anticyclones.
LSCE-IPSL climatologists, together with LISN and SPEC physicists have implemented LDA (Latent Dirichlet Allocation), a machine learning algorithm, which isolates these large scale structures (cyclones and anticyclones) that can thus be analyzed individually. This is a valuable asset for studying extreme weather events such as cold waves or extra-tropical storms!

LDA is capable of analyzing thousands of documents in a short time and highlighting important elements, recurrences and anomalies. It is generally used in linguistics to study natural language: its word analysis reveals the theme(s) of a document, each theme being identified by a specific vocabulary or, more precisely, by a particular statistical distribution of word frequency.

In the climatologists’ use of LDA, the document is a daily weather map and the word is a pixel of the map. The theme with its corpus of words can become a cyclone or an anticyclone and, more generally, a ‘pattern’ that the scientists term motif.

Artificial intelligence – a sort of incredibly fast robot meteorologist – looks for correlations both between different places on the same map, and between successive maps over time. In a sense, it ‘notices’ that a particular location is often correlated with another location, recurrently throughout the database, and this set of correlated locations constitutes a specific pattern.

The algorithm performs statistical analyses at two distinct levels: at the word or pixel level of the map, LDA defines a motif, by assigning a certain weight to each pixel, and thus defines the shape and position of the motif; LDA breaks down a daily weather map into all these motifs, each of which is assigned a certain weight.

In concrete terms, the basic data are the daily weather maps between 1948 and nowadays over the North Atlantic basin and Europe. LDA identifies a dozen or so spatially defined motifs, many of which are familiar meteorological patterns such as the Azores High, the Genoa Low or even the Scandinavian Blocking. A small combination of those motifs can then be used to describe all the maps.

These motifs and the statistical analyses associated with them allow researchers to study weather phenomena such as extreme events, as well as longer-term climate trends, and possibly to understand their mechanisms in order to better predict them in the future.


The preprint of the study is available as:
Lucas Fery, Berengere Dubrulle, Berengere Podvin, Flavio Pons, Davide Faranda. Learning a weather dictionary of atmospheric patterns using Latent Dirichlet Allocation. 2021. ⟨hal-03258523)


Davide Faranda, LSCE-IPSL •

Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)