Density forecasting for long-term electricity demand in South Africa using quantile regression

Electricity load is the amount of electricity that balances the amount generated with that drawn from the grid. In the absence of black-outs, load-shedding and the availability of electricity generated from renewable electricity sources, the electricity load is equivalent to the electricity demand. Therefore, in this study, the hourly electricity demand is defined as the amount of electricity (load) in kW sent out every hour by Eskom to meet consumers’ demand.


Introduction
Electricity load is the amount of electricity that balances the amount generated with that drawn from the grid.In the absence of black-outs, load-shedding and the availability of electricity generated from renewable electricity sources, the electricity load is equivalent to the electricity demand.Therefore, in this study, the hourly electricity demand is defined as the amount of electricity (load) in kW sent out every hour by Eskom to meet consumers' demand.
The 1996 census showed that only 57.6% of the South African households had access to electricity for lighting (Statistics South Africa 1998).The 2001 census showed that this percentage went up to 70.2% (Lehohla 2005).The 2007 Community Survey indicated that 80.1% of the South African households had access to electricity for lighting (Statistics South Africa 2008).The 2011 census showed that this percentage went up to 84.7% (Statistics South Africa 2011).These censuses and surveys indicate that the high percentages of new households that were connected to the electricity grid between 1996 and 2007 would imply that residential electricity demand would be expected to have increased during the same period.The percentage of new households connected to the grid stabilised between 2007 and 2011 and, therefore, residential electricity demand would be expected to have stabilised during this period.The shrinking South African economic growth between 2007 and 2015 could have contributed to a decline in electricity demand: South Africa experienced an average growth rate of approximately 5% in real terms between 2004 and 2007.However, the period 2008 to 2012 only recorded average growth of just above 2%.(Statistics South Africa n.d.)The penetration of other sources of electricity such as renewables for example solar and wind, could also have contributed to a decline in electricity demand from Eskom.In addition, because of the lack of capacity in the generation of electricity experienced by Eskom in 2007 (Inglesi & Pouris 2010), some companies and households had to find other sources of electricity, which would have resulted in a decline in electricity demand from Eskom.Unfortunately, the actual size of the electricity demand market is still unknown because of the unavailability of certain types of data, such as renewable energy and other forms of electricity generation.The combined effect of all these changes in the demography, economy and usage patterns can be investigated using historical patterns, but contribute to uncertainties when trying to forecast future electricity demand.
Uncertainties occur in estimation, prediction or in forecasting.When statisticians develop predictions (forecasts) for an uncertain future, they need to quantify the uncertainties around these for those that have to make decisions in the face of those uncertainties.Sigauke (2014) indicates that uncertainties in future electricity demand could emanate from increased technologies making use of electricity, population growth, general randomness in individual usage of electricity, seasonal effects, prevailing economic patterns, change in weather conditions, escalating costs, use of power saving electrical appliances and the growing sources of renewable energies.The inherent uncertainties in predictions imply that forecasts should ideally be probabilistic; in other words, they should take the form of probability distributions over future quantities or events (Gneiting & Katzfuss 2014).Probabilistic forecasts could take the form of quantiles, prediction intervals or density forecasts to quantify uncertainties in predictions.They are an essential ingredient of optimal decision-making (Gneiting & Katzfuss 2014).It is important to quantify the uncertainties around the demand forecasts for planning purposes, to avoid building unnecessary infrastructure and to ensure that future electricity demand is met.Tay and Wallis (2000) define density forecasts of the realisation of a random variable at some future time as estimates of the probability distribution of the possible future values of that variable.Hong, Wilson and Xie (2014) argue that forecasting is by nature a stochastic problem, but that most of the utilities are still developing and using point forecasts.They state that it would be better to use probabilistic forecasts that provide estimates of the full distribution of the possible future values as a way of quantifying the uncertainties in the forecasts.
In the late 1880s, when lighting was the sole end use of electricity, the forecasting of electricity demand was straightforward (Hong & Shahidehpour 2015).Power generating companies would count the number of light bulbs they installed and planned to install and they would then roughly estimate the level of demand in the evening.As electric appliances such as electric irons, radios, television sets, geysers, stoves and washing machines were invented and commonly used in many households, the complexity of forecasting electricity demand grew.The penetration of air conditioners into homes and offices to regulate temperature within comfort zones, and industrial uses of electricity became important drivers of electricity demand.These drivers of electricity demand add complexity in electricity demand forecasting and create uncertainties around the forecasts.Electricity forecasting methods have evolved from counting light bulbs and engineering approaches which were based on the use of charts and tables, to manually forecasting future demand, to sophisticated forecasting techniques.The availability of powerful computers and statistical software today enables forecasters to produce more accurate forecasts through sophisticated forecasting methods.
Electricity demand forecasts can be developed for short, medium-or long-term horizons, and they could be provided as point forecasts, which give one value at each time interval, or as probabilistic forecasts which give a full distribution of future values and therefore allow the assessment of uncertainties around the forecasts.Quantification of uncertainties around forecasts is even more important for long-term forecasts, because, as Sigauke and Chikobvu (2011) indicated, long-term decision-making in the electricity sector involves planning under substantial uncertainty.
In the literature to date, short-term electricity demand forecasting has attracted substantial attention because of its importance for power system control, unit commitment and electricity markets.Medium-and long-term forecasting have not received much attention, despite their value for system planning and budget allocation (Hyndman & Fan 2010).International literature on probabilistic load forecasting is very limited, and for load forecasting it is still dominated by short-term point forecasting.
There are some literature available on long-term forecasting of annual electricity demand as well as peak electricity demand in South Africa (Inglesi-Lotz 2011;Koen, Magadla & Mokilane 2014;Rasuba, Khuluse & Elphinstone 2010;Sigauke 2014;Sigauke & Chikobvu 2011;Ziramba 2008).The models used to forecast electricity demand in South Africa do not forecast the full distribution of demand and most of them are for short-term electricity demand (Sigauke 2014 among others).The objectives of the study were (1) to apply a quantile regression (QR) model to forecast hourly distribution of electricity demand in South Africa; (2) to investigate variabilities in the forecasts and evaluate uncertainties around point forecasts and (3) to determine whether the future peak electricity demands are likely to increase or decrease.Weron and Misiorek (2004) indicate that forecasting models could be classified into two broad streams: those that use statistical methods (e.g., multiple regression, autoregressive (AR), autoregressive integrated moving average [ARIMA], autoregressive generalised autoregressive conditional heteroscedasticity [AR-GARCH], jump diffusion, factor models, regime switching models, multilevel models, mixed models and semi-parametric models) and those that use computational intelligence techniques (such as fuzzy techniques, support vector machines and, in particular, artificial neural networks [ANNs]).

Methodology framework
Statistical methods differ from ANN in that the former forecast the current value of a variable by using mathematical combination of the previous values of that variable and sometimes the previous values of exogenous factors (Weron & Misiorek 2004).Weron and Misiorek (2004) pointed out that the reviewers of ANN-based forecasting systems have concluded that much work still needs to be conducted before they are accepted as established forecasting techniques.ANN is considered a black-box modelling approach.In electricity demand forecasting, statistical models are attractive because physical interpretation may be attached to their components, and hence allow forecasters to understand behaviour (Weron & Misiorek 2004).Suganthi and Samuel (2012) give a comprehensive review of demand forecasting models which are commonly used in the energy sector.Electricity demand data consist of a sequence of observations collected over equally spaced time periods (hourly) with no missing data.The observations are serially correlated.Statistical modelling approaches for forecasting electricity demand can be divided into three main groups.Firstly, there are approaches which consider demand as a univariate time series, that is, a load forecasting process which results in one forecasted value at each step, or point forecasts.Secondly, there are approaches which take each intraday period as a separate parametric regression and estimate each model's parameters separately, ignoring the intraday correlation in the process.Thirdly, there are approaches which consider each intraday period as a separate parametric regression model and estimate model parameters together in a way that takes the intraday correlations into consideration.
Within the univariate time series framework, the stochastic nature of electricity demand as a function of time has frequently been modelled with seasonal autoregressive integrated moving average (SARIMA) and state space models (Taylor, De Menezes & McSharry 2006).Mostly, electricity data exhibit not only non-constant mean and variance, but also multiple seasonalities corresponding to daily, weekly, monthly and yearly periodicity.The assumption of homoscedasticity in SARIMA models is also inappropriate for the forecasting of electricity demand.Furthermore, SARIMA models are used for point forecasting and cannot forecast the full demand distribution.
SARIMA models could be extended to a SARIMA-GARCH model to account for the possibility of heteroscedasticity.A GARCH modelling approach could be used to capture potential conditional heteroscedasticity in electricity data (Byström 2005;Taylor 2006).However, this modelling approach does not accommodate exogenous drivers of electricity demand, and is used for point forecasting.
Seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) models, also known as regression-SARIMA, have been used in load forecasting in order to incorporate important drivers of demand such as calendar variables and temperature (Bunn 1982;Suganthi & Samuel 2012;Weron 2007).This method uses an ordinary least squares regression (OLS) model which may be affected by outliers and could underestimate the peaks as it models the mean of the distribution.
Structural time series (STS) models have also been successfully used in demand forecasting.STS modelling was developed by Harvey (1990) and it involves the decomposition of a time series into trend, seasonality, cycle and irregular (noise) components.This modelling approach can accommodate drivers of electricity demand like temperature, but is also used for point forecasting.Hyndman and Fan (2010) propose a semi-parametric additive model in the regression framework, but which includes nonlinear relationships and serially correlated errors.The proposed models allow for nonlinear and non-parametric terms using the framework of additive models.The authors applied this method to develop long-term probabilistic load forecasts.
OLS regression models model the relationship between covariates X and the conditional mean of a response variable Y given X = x.Koenker and Bassett (1978) argue that what the regression curve does, is to give a summary for the averages of the distributions corresponding to the set of Xs.One could go further and compute several different regression curves corresponding to the various percentage points of the distribution and thus get a more complete picture of the set (Koenker & Bassett 1978).Ordinarily this is not done, and so regression often gives a rather incomplete picture.In forecasting electricity demand, least squares regression models the mean of the electricity demand as the dependent variable.OLS regression determines coefficients α 0 and α i which minimise To apply an OLS regression model, the data must meet stringent assumptions, such as that the residuals should be normally distributed, the observations should be independent and the variance of the residuals should be homoscedastic.As we are dealing with time series data the observations are not independent, and for electricity demand data, the variance is heteroscedastic.Therefore, some assumptions of OLS are violated in the time series data used in this study.
In this article, QR is proposed for developing long-term probabilistic forecasts.QR was developed as an extension of OLS regression for estimating rates of change in all parts of the distribution of the response variable (Cade & Noon 2003).QR offers a comprehensive strategy for completing the regression picture and it has been applied in ecology (Cade & Noon 2003).QR has been widely used in financial economics where the data are volatile and extremes are important to study.Gibbons and Faruqui (2014) applied QR methodology for forecasting the annual peak electricity demand.Cornec (2014) proposes QR as a way of estimating the distribution of forecasts, and uses the dispersion of the estimated quantiles for calculating an uncertainty index.QR imposes no normality assumption, allowing, for example, for fat-tailed distributions, which is useful for forecasting extreme events.In electricity demand forecasting, QR can be used to model the median, the 1st, 5th, 10th, 90th, 95th and 99th percentiles or all quantiles to describe the full distribution of forecasted electricity demand at each hour.QR attempts to find the coefficients a 0 and a i which minimises QR does not require any distribution assumptions regarding the population and can estimate the parameters nonparametrically (Koenker & Bassett 1982).A linear model for the τth quantile is given by:  1, , . , where x i T is the transposed (indicated by T ) design matrix (matrix of covariates), β is the regression coefficients and the τth quantile of ϵ i is assumed to be zero.The standard QR model is given by: where is the conditional τth quantile of the response (y i ) given the covariate (x i ) and is non-decreasing function of τ for any given x.β is the vector of parameters and is the marginal change in the quantile because of the marginal change in x i In estimating the QR model for a given quantile, we follow the ideas of Koenker (2005) and Yue and Rue (2011) who used the standard approach of Koenker and Bassett (1978) to estimate their QR model.QR minimises the tilted absolute function ρ τ (.), which they called the check-function (Maistre, Lavergne & Patilea 2017), which asymmetrically weights residuals from the model to a degree that depends upon τ.
is a continuous piecewise linear function and nondifferentiable at ϵ = 0 but differentiable everywhere else (has directional derivative in all directions) (Yue & Rue 2011).This check-function ensures that all ρ τ are positive and the scale is based on the probability τ.A linear model is estimated by solving: This concept is extendable to any quantile, such as the 75th, 90th and 99th percentile.The QR estimator for β at quantile τ minimises the objective function: This is a non-differentiable function and there is no closedform solution for β ˆ; instead these parameters can be found using a linear programming algorithm (Gibbons & Faruqui 2014).The minimisation is done for each subsection defined by ρ τ , where the estimate of the τth quantile function is achieved with the parametric function x i T β .
Features that characterise QR and differentiate it from other regression methods are: • QR computes several different regression curves corresponding to the various percentile points of the distribution and thus provides a more complete picture of the relationship between the response variable and the covariates.• Heteroscedasticity can be detected and, if the data are heteroscedastic, median regression estimators can be used instead of mean regression estimators.• Median regression is more robust to outliers than other regression methods that use mean estimators, and it is semi-parametric, therefore, avoiding assumptions about the parametric distribution of the error process.
QR is, therefore, considered to be more suitable than other methods, given the type of data used, as well as its ability to provide the full distribution of forecasted electricity demand.

Data and analysis
The hourly electricity demand data for South Africa for the period 1997-2015 was provided by Eskom.For developing the long-term forecasting model, a transformed series was developed using a logarithmic transformation.The logarithmic transformation is convenient for turning a highly skewed variable into one that is more approximately normal (Benoit 2011).used, where applicable, to capture the cycles inherent in the demand data.
The future hourly electricity demands were forecasted at 0.01, 0.02, 0.03, … , 0.99 quantiles of the distribution using QR, hence each hour of the day would have 99 forecasted future hourly demands, instead of forecasting just a single overall hourly demand as in the case of OLS.To avoid graphs that are too busy and difficult to read, only the 1st, 50th and the 99th percentile graphs are shown and discussed.
The uncertainties in the forecasts are captured by the interval between the 1st and 99th percentiles of the demand distribution, as this is the interval into which 98% of the possible future hourly demands are expected to fall.The wider the interval, the more uncertain we are about the forecasted hourly demand as the variability between the forecasts would be very high.The forecasts at the 50th percentile (median) are important because they could be used as point forecasts, namely, our best guess of demand at that certain hour.
The density functions give the full distribution of the hourly electricity demand.The probability of the hourly demand between the two demand points say, 'a' and 'b', is the area under the demand density function between the two points.The area could be calculated by integrating the density function between the two points.The probability of exceedance relates to the probability of electricity demand exceeding the specified hourly demand and this could also be calculated by integrating the density function from the specified demand and upwards.The forecasted density demand functions between 2013 and 2023 are compared.
If the density curves shift towards the higher demands over time, then an increase in future hourly demand is expected.If the shift is towards smaller demands over time, then the decrease in future hourly demand is expected.
The South African electricity demand has two daily peaks, especially noticeable in winter, namely a morning demand peak at around 08:00 and an afternoon demand peak at around 19:00.As the winter peak represents the highest annual demand, this is important for planning electricity generation.Therefore, it is important to examine the demand densities of both morning and afternoon peak forecasts over the years.The morning peak density demand is generated from all possible demand forecasts at 07:00, 08:00 and 09:00, whereas the afternoon peak density demand is generated from all possible demand forecasts at 18:00, 19:00 and 20:00.The probability of the future peak electricity demand exceeding a certain value was then calculated by integrating the density functions.
The performance of the QR model is evaluated by comparing the predicted demand density functions with the actual demand density functions.The predicted demand density is generated from all demand forecasts from 1st to 99th percentile of the demand distribution.If the forecasted demand density function closely tracks the actual demand density, then it shows that the model is forecasting well and it is, therefore, reliable.The model is also evaluated by observing the closeness of the actual demand distribution to the predicted lower and upper 99% interval.If the interval is narrow, then the predictions exhibit sharpness.The mean absolute percentage error (MAPE) is used to compare the forecasts at the 50th percentile with the actual demands; this is mainly to determine how far the point forecasts are from the actual demands.
Figures 1 and 2 depict the time series of the actual and logarithmic hourly demands, respectively, for the hourly electricity demand over the 1997-2015 period.During this period the highest demand reached was 36 826 kW in 2011, whereas the minimum was 13 533 kW in 1998.
The historical demand data from 1997 to 2015 indicate that the demand for electricity increased steadily between 1997 and 2007 (Figure 1).During this period, South Africa experienced accelerated economic growth and a large number of new households were connected to the grid as government wanted to make electricity accessible to all South Africans.The electricity demand from Eskom stabilised between 2007 and 2012 and started declining in the latest 4 years until 2015 (see Figure 1).The decline in electricity demand from Eskom could partly be attributed to the shrinking economic growth between 2007 and 2015 and the growth of renewable sources of electricity.Table 1 provides a summary of all the variables considered in the modelling of hourly demand.The demand data have Periods 6, 12, 18 and 24 as shown in Figure 1-A2.The Fourier series terms were formed using these periods.

Model assessment
For each hour, the demand at the 1st to 99th percentiles was forecasted from 2006 to 2023, which translated into the full demand distribution being forecasted.
The actual and predicted demand densities between 2012 and 2015 illustrated in Figure 3 were used to assess the model fit.If the predicted demand distribution is close to the actual demand distribution, then the forecasts are considered to be reliable.The closer the 1st and the 99th percentile points are to the actual demand, the better it is and this indicates the sharpness of the forecasts (Figures 4-6).The sharpness of the forecasts refers to how tightly the predicted distribution covers the actual distribution.
The MAPE between the hourly demand forecasts at the 50th percentile and the actual hourly demand over the period of 4 years were below 5% and the overall MAPE was 2.77%, as shown in Table 2-A1.Lewis (1982) indicates that a MAPE of less than 10% can be classified as a highly accurate forecast.
The QR model therefore provides very good demand forecasts at the 50th percentile.

Model estimates
For illustration purposes, estimates for only three of the QR models (at 0.01, 0.5 and 0.99 quantile levels) are given in Table 1-A1.At the 5% level of significance, some variables in Table 1-A1 are significant at the certain percentile of the demand distribution, but insignificant at others.For example, at the 5% level of significance, the variable 'Month1' is significant at the 1st percentile, but not significant at the 99th percentile of the distribution.

Probabilistic forecasts of the daily profiles over the years
For illustration purposes, 4 days in June (22-25 June) were selected and their results from 2013 to 2015 were discussed.(Note that June falls in the high-demand winter period of the year.)The different panels in Figure 4 give the hourly electricity demand of each of the 4 days in 2013.For each day, the green circles represents hourly demands at the 1st percentile level of the distribution; these are the points below which 1% of all possible future hourly demands are expected to fall.The grey line represents future hourly demands at the 99th percentile of the demand distribution; these are the points above which 1% of the future hourly demands would fall and below which 99% of all possible future hourly demands are expected to fall.The blue circles represent the forecasted hourly demands at the 50th percentile and the red circles represent the actual hourly demands.Figures 5 and 6 give the hourly demand forecasts for the same 3 days in 2014 and 2015, respectively.
Figures 4-6 confirm that the demand forecasts at the 50th percentile are close to the actual hourly electricity demand in winter.
In addition to just using the 50th percentile as a 'best guess' or point forecast, information contained in the other quantile forecasts produce probabilistic information which may also be useful in the planning process.It can be seen from Figures 4-6 that the interval between the 1st and the 99th quantiles was in fact fairly narrow, and therefore the uncertainties around the point forecasts were not too large.

Using probabilistic forecasts of hourly demand distributions for comparing demand over the years
The hourly electricity demand distribution in South Africa is bimodal as shown in Figure 7.By comparing the demand density functions over the years, insight into expected shifts in patterns can be obtained.The forecasted demand distributions obtained from the QR models for the period investigated suggest that the hourly electricity demand from Eskom is more likely to shift towards lower demands over the years until 2023 (Figure 7).The apparent year to year decline in electricity demand from Eskom between 2012 and 2015, among others, could be attributed to the increase in the number of households and companies generating their own electricity through renewable energies, the shrinking economic growth and the increase in electricity prices.The renewable electricity market in South Africa is growing.
In addition, the forecasted hourly density demands (in Figures 7-9) could be used to calculate the probabilities of exceedance, for example, the probability of hourly demand exceeding 32 860 kW (exp [10.4]).This probability of exceedance can be calculated from the area under the density curve comprising all demands ranging from 32 860 kW and  above, and the area can be computed by integrating the density functions in Figure 7.The probability of demand exceeding 32 860 kW is less likely in 2023 than it is in any previous year while 2015 had the highest probability of demand exceeding 32 860 kW in the period between 2013 and 2023 as shown in Figure 7.
Finally, the forecasts obtained from QR can be used to investigate the expected future peak demands and their probability of exceedance over the years.The annual peak demands are very important for planning purposes, as these represent the maximum that would need to be supplied in an hour and if the power generating company could meet the daily peak hourly demand, it could meet any hourly demand.Figures 8 and 9 suggest that the morning and afternoon peak demand distributions are more likely to shift towards lower demands over the years until 2023.

Conclusions and discussion
The daily electricity demand in South Africa generally has two peaks, more noticeable during winter than summer seasons.The morning demand peak occurs at around 08:00 and the afternoon demand peak at around 19:00.OLS would most likely underestimate the peaks as it models the mean of the demand distribution while QR models demand at all percentiles of the demand distribution and therefore can provide better peak forecasts.In addition, as QR gives the full hourly demand distribution, the uncertainties around the forecasts are quantifiable.While the best guess of the future hourly electricity demand can be obtained from forecasted demands at the 50th percentile, QR gives forecasts at all percentiles of the distribution, allowing the potential variabilities in the forecasts to be evaluated by comparing the 50th percentile forecasts with the forecasts at other percentiles.Additional planning information, such as expected pattern shifts and probable peak values, could also be obtained from the forecasts produced by the QR model, while such information would not easily be obtained from other forecasting approaches.
The first important finding presented in this article is that the demand forecasts at the 50th percentile from the QR model closely estimate the actual hourly demands (see the red and blue circles in Figure 4-6 and the MAPE values in Table 2-A1 in Appendix 1).The second important finding is that the distributions of hourly demand and the peak daily demand in South Africa are shifting towards lower demands over the years until 2023 as shown in Figures 7-9.The third finding is that QR allows the assessment of uncertainties around point forecasts.This was illustrated by calculating the probability of forecasted hourly density demands (in Figures 7-9) exceeding 32 860 kW.The probability of demand exceeding 32 860 kW was found to be less likely in 2023 than in any previous year.
The forecasted electricity demand distribution closely matched the actual demand distribution between 2012 and 2015 as shown in Figure 3. Therefore, the forecasted demand distribution is expected to continue representing the actual demand distribution until 2023.Using a QR approach to obtain long-term forecasts of hourly load profile patterns is, therefore, recommended.http://www.sajems.orgOpen Access

FIGURE 2 :
FIGURE 1: Electricity demand between 1997 and 2015 in South Africa.

FIGURE 8 :
FIGURE 8:The morning peak demand distributions.

FIGURE 9 :
FIGURE 9: The afternoon peak demand distributions.
Farland 2013)s were forecasted from 2006 to 2023.The data from 2013 to 2015 were withheld in order to validate the model.The forecasts from 2006 to 2012 were then used as insample forecasts, whereas the forecasts from 2013 to 2023 were out of sample forecasts.Various time-related variables were used as covariates, namely day, public holidays, months, weekends, December break and seasons (see Table1for details).Lagged demand variables were included in the model to test whether suspected lagged demand effects from the high degree of diurnal activity in electricity usage were significant, that is, whether the South African consumers of electricity typically exhibit consistent daily patterns of usage (as inFarland 2013).For example, in the afternoon when people return from work, around 19:00, they start cooking, watch TV and take bath and at this time the household electricity demand could go up.Fourier series or harmonic terms were

TABLE 1 :
Variables used in the quantile regression.