Cahora Bassa Inflow Forecasting System

The Tethys Forecasting System for inflows at Cahora Bassa, in Mozambique, is under active development by mandate of the Hidroeléctrica de Cahora Bassa (HCB). It relies on the Global Forecasting System by the National Oceanic and Atmospheric Administration (NOAA), near real-time satellite observations, and ground measurements to perform inflow forecasts for the Cahora Bassa Reservoir. Data is processed using the Generalized Pareto Uncertainty algorithm and the Routing System Minerve hydrological modeling software.

More information and details about the system can be found below or with the help of the menu on the left. The data and forecasts can be accessed from the map. The sensitivity of the data requires that the contents can only be seen by registered users. If you would like to have access to the system please contact the administrator.

Overview

The map

The map is the access point to the bulk of the information provided by the system. It is from the map that all hydrological time series and areal information (such as satellite measurements and weather forecasts) can be inspected. As shown below, the map is capable of representing spatial vector information like watersheds and river networks. Most importantly, it can serve to display time series of different types (what types and how many is left to the administrator).

Map view

Updated hydrologic information

Tethys can be connected with other databases or data sources and thus remain always up to date. The series can be dynamically visualized, compared and downloaded directly from the website.

Series

Meteorological forecasts and satellite products

Hydrological forecasts depend heavily on meteorological predictions. Tethys accesses data from several sources and uses it to produce its hydrological forecasts. The areal data, be it derived from satellite observations (see below a 3-hour time step evolution of precipitation from a storm cell passing over the Rogun catchment) or the product of meteorological forecasts.

TRMM1 TRMM2 TRMM3 TRMM4 TRMM5 TRMM6

Hydrological forecasts

Producing reliable hydrological forecasts is the main objective of Tethys. The notion of reliability is closely linked to that of probability. However good it may be, any operational hydrological forecast displays errors. Accordingly, for informed decision making it is vital to know what those errors may be.

Usually, operational forecast tools rely on results from determinisitic hydrological models. Moving from a deterministic to a probabilistic mindset allows taking uncertainty fully into account. With probabilistic models the predictions are no longer a "best guess", but rather a full distribution of values and their corresponding probabilities (see the figure below).

From deterministic to probabilistic

The Tethys Forecasting System relies on a new forecasting technique: the Generalized Pareto Uncertainty. The technique is rooted in artificial intelligence and, basically, allows the forecasting system to learn from what was observed in the past in order to forecast the future. The same principle can be used to predict discharges, water levels, accumulated inflow volumes and many other hydrological variables of interest. Below, a 15 day-ahead example forecast for Rogun can be compared with observations.

Forecast

The technique is versatile, and has been employed in a number of different settings. Below, two examples shared at the European Geosciences Union General Assembly 2017 can be found. In the first the Generalized Pareto Uncertainty is used to predict suspended sediement concentration in the Yangtze River. The second illustrates the application of the methodology to a relatively small Alpine catchment in Switzerland.

EGU Yangtze

EGU Switzerland

System management

Behind the scenes all major aspects of the system can be managed though a comprehensive interface. Notably, new forecasts can be prepared and experimented with.

Site administration

Technical details

The website

The Tethys Forecasting System is based on open-source technology. Its base programming language is Python, which offers the almost unique possibility of using a single language to program the data manangement and visualisation tool (the website) and the scientific code required to prepare and run the forecasts. The database behing Tethys is MySQL and Tethys itself runs on a Django web server. To speed things up, the code at the heart of the forecasting system was written in openCL are performed in parallel using a Graphics Processing Unit (instead or in parallel to multiple CPU cores). With this, years-worth of probabilistic forecasts can be computed in a matter of seconds.

Technologies

Data

The system operates based on numerical weather forecasts and ground stations. The weather forecast data comes from the Global Forecasting System (GFS) by the USA National Atmospheric and Oceanic Agengy (NOAA). The forecasts go up to 16 days and are retrieved daily at a resolution of 0.5x0.5 deg. Ground data comes mainly from ZRA. The database daily collects information from both automatic and manual weather and hydrometric stations within and in the vicinity of the Vakhsh catchment. Below, examples of temperature, precipitation, and snow depth forecasts produced by GFS are displayed.

GFS temperature GFS forecast GFS snow

Hydrological modelling

Optionally, the forecasts produced by Tethys can also benefit deterministic conceptual hydrologic models, reaping the best of the deterministic and probabilistic approaches.

Models are specifically calibrated for the region in question and the Tethys system acts as a filter, in a procedure whereby its internal states and parameters are automatically adapted in case of a large departure from the observations. The quality of the deterministic predictions is evaluated using the Nash-Sutcliffe Efficiency.

Probabilistic forecasting

Probabilistic forecasting extends the traditional deterministic approach by associated uncertainty (i.e. potential errors) with every prediction.

From deterministic to probabilistic

Doing so is not straightforward because the error is not always the same. Predictive errors of hydrological series in particular are affected by issues such as heteroscedasticity (errors are usually greatest during high flows), non-normality (often hydrological probability distributions differ from the Gaussian distribution), or autocorrelation (errors in consecutive timesteps are usually related, meaning that errors cannot be considered independent). Such features render the implementation of analytical solutions for probabilistic forecasting impractical.

The previous points do not mean that attempts at estimating uncertainty are not usually made. In fact, recognizing its importance, operational forecasting systems often include ways to model uncertainty. In most cases, however, such efforts are computationally expensive and bounded by important modeling assumptions, leading only to rough estimations. Fortunately, it is easy to assess whether a probabilistic prediction is reliable or not. One way to do this is by computing a predictive quantile-quantile plot (see example below). In a predictive quantile-quantile plot, a statistically reliable prediction will fall on a diagonal line. Departures from that line indicate discrepancies between observations and the prediction and can be used to learn what may be wrong with the model.

Quantile-quantile plot

Generalized Pareto Uncertainty

The probabilistic forecasting capabilities of Tethys are its distinguishing feature. The Generalized Pareto Uncertainty algorithm used by Tethys does not assume much about the process being simulated, learning everything it needs to know from historical data. Due to this, a large array of variables can be predicted if there is a sufficiently long historical series available (preferably 10 years or more).

GPU works by combining a very large number of regression models to produce its probabilistic forecasts. Each one of these regression models is a deterministic function of the type y=f(X, W), where X represents several input variables that may be used as inputs, and W is a matrix of model parameters. For example, we may want to predict discharge one week from now based on today's observations of discharge, temperature, and precipitation; that would look something like this: Qt+7=f([Qt, Tt, Pt], W).

The choice of an adequate regression model depends on the problem at hand. To keep things general, an allowing Tethys to predict a wide range of variables, artificial neural networks were chosen. Artificial neural networks are machine learning or artificial intelligence models that come in different shapes and sizes (see figure below). In Tethys, the multi-layer perceptron type was chosen. These models emulate the human brain at a sub-symbolic level. Based on layers of "neurons" and the "synapses" connecting them, multi-layer perceptrons can be trained with historical observations and learn to predict the behavior of complex systems.

Artificial neural networks

But that was only one model. For a probabilistic prediction the Generalized Pareto Uncertanty combines thousands of such models, each with its own specific parameters (W). The key is to find the adequate parameters for each regression model f. The Generalized Pareto Uncertanty uses a custom multi-objective optimization code that aims for:

1) each prediction being as close as possible to observations;

2) creating regression models that make predictions between always below obsevations to always above them.

The training process is rather complicated (follow this link for further information), but it can be summarized in the figure below. In the figure, each point represents a regression model (in this case an artificial neural network). The x-axis represents non-exceedance, which can be related to probability. For example, a non-exceedance of 0 means that, for the historical data set used to train the models, predictions were never exceeded by observations. Conversely, a non-exceedance of 1 is obtained when historical predictions have consistently been above observations. In order to guarantee that, for all non-exceedances, predictions are as close as possible to observations, it is important to account for the error associated with each model (y-axis). Trough several epochs (or iterations) of training, the Generalized Pareto Uncertainty algorithm will find the sets of parameters that allow the many regression models to cover the full range of non-exceedances (from 0 to 1) with as little error as possible.

Generalized Pareto Uncertainty

An example of the Generalizing Pareto Uncertainty model in the operation is shown below, where three figures illustrate the start, middle, and end phases of model training. On the left plot the non-exceedance vs. error plane is portrayed; in the middle, the quantile-quantile plot is displayed (perfect reliability along the diagonal); and on the right plot the time series is shown (obsevations in red, probabilistic predictions in shaded gray). When the training starts predictions are very bad (right plot) reliability is poor (middle plot). As the training progresses (second figure), better parameters for each artificial neural network are found and predictive error is reduced (right plot). Also, reliability is dramatically improved (middle plot) and predictions become much better (right plot). When training is allowed to continue (third figure), the optimal parameters of reach artificial neural network will be found (left plot). Reliability will typically be excellent (middle plot), and the uncertainty associated with the predictions is further reduced (right plot).

Training 1 Training 2 Training 3

Forecasts and status

Please login.

Access

Register

The information shared here is potentially sensitive. Please contact the site administrator in order to gain access.

Access forecasts and other information

Even after registered, some information will remain protected. If you believe that this information should be provided to you, please contact the site administrator.

Receive periodical updates by e-mail

The system sends periodical e-mails with important information. If you still don't receive these e-mails and would like to be included among the recipients, please contact the site administrator.

About

Development

The Tethys Forecasting System was developed at the École Polytechnique Fédérale de Lausanne (EPFL), in Switzerland, as part of the ADAPT Database research project. The initial development has been financed by the Swiss Competence Centre Environment and Sustainability (CCES) and EPFL. This version of the forecasting system has been further developed by Stucky Ltd for HCB with the main aim of producing operational inflow forecasts for the Cahora Bassa Dam.

The logo

The logo is inspired on the representation of Nyami Nyami, the Zambezi River God of the Tonga, an ethnic group living mostly in Zambia and Zimbabwe, in the region of the Kariba Dam.

Nyami nyami

The name

The name of the system - Tethys - has a double relationship with its purpose and history. Firstly, Tethys is a titan in Greek mythology know to be the mother of the greek river gods. Secondly, Tethys gave her name to the Tethys Ocean which, when the dinosaurs still roamed the earth, bathed the eastern shore of the Gondwana Continent (and thus ancient Africa). The fact that Tethys is historically connected to rivers and that the Zambezi drained to the Tethys Ocean inspired the name choice.

Tethys