The Evaluate Forecasts by Location tool is used to evaluate and merge multiple forecasts of the same underlying time series data at a set of locations. At each location, the most accurate forecast method is selected to represent the forecast for that location, so you can try multiple forecast methods and select the most accurate one location by location. The primary output is a map of the final forecasted time step for the selected forecast method at each location as well as informative messages and pop-up charts.

The inputs to this tool must be created by tools in the Time Series Forecasting toolset that used the same space-time cube as input. The most accurate forecast method at each location can be determined by how closely the model fits the measured values of the space-time cube or by how accurately it predicts to withheld time steps at the end of each time series.

It is recommended that you read the documentation for each forecast method that you provide to this tool to learn about each method's forecast model, validation model, and root mean square error (RMSE) statistics.

Learn more about how Curve Fit Forecast works

Learn more about how Exponential Smoothed Forecasting works

Learn more about how Forest-based Forecast works

## Evaluate the forecast method at each location

The goal of the tool is to select the most accurate forecast method at each location of a space-time cube. However, there is more than one way to measure the accuracy of a forecast method. This tool uses one of two criteria for determining the most accurate forecast at each location.

### Evaluate using Validation RMSE

The default option of the tool selects the forecast method with the smallest Validation RMSE at each location. To use this option, keep the Evaluate Using Validation Results parameter checked. The Validation RMSE is calculated by withholding some of the final time steps at each location and using the remaining time steps to predict the values that were withheld. The predicted values are then compared to the true values to see how closely they align. It is usually recommended that you evaluate using validation results because predicting to withheld time steps at the end of the time series is analogous to forecasting to future time steps, which is the goal of time series forecasting.

To use this option, all input space-time cubes must exclude the same number of time steps for validation, and that number must be greater than 0.

### Evaluate using Forecast RMSE

You can also choose to select the forecast method with the smallest Forecast RMSE at each location. To use this option, uncheck the Evaluate Using Validation Results parameter. The Forecast RMSE measures how closely the forecast model fits the measured values of the time series at each location. Because the Forecast RMSE measures the fit of the data that was used to estimate the forecast model, the forecast model often fits the measured values of the time series more accurately than it forecasts the values of future time steps.

This option is recommended when relatively few time steps are excluded for validation. This situation is common for space-time cubes with a small number of time steps where it is not feasible to exclude more than a few time steps for validation. This option is also recommended when you need to test whether the selected forecast method provides a statistically significantly better fit than the other methods.

#### Test for equivalent accuracy of forecast methods

If you evaluate using Forecast RMSE, the selected method at each location is statistically compared to each of the methods that were not selected. The tool selects the method with the smallest Forecast RMSE, but that doesn't mean that the selected method is significantly more accurate than the other methods. To determine whether the selected method provides a significantly better fit, a statistical test is needed.

For each comparison, either the Diebold-Mariano (DM) test or the Harvey, Leybourne, and Newbold (HLN) test is performed at a 95 percent confidence level. The DM and HLN tests are statistical hypothesis tests for whether two forecast models have equivalent accuracy. The HLN test is a modified version of the Diebold-Mariano (DM) test to correct for small sample sizes. For large sample sizes, the tests are equivalent. If the number of time steps in the forecast models is 30 or greater, the Diebold-Mariano test is performed at the location. Otherwise, the HLN test is performed.

The DM and HLN tests both calculate their test statistic based on the fit of the forecast models to the measured values of the time series. The calculations do not use the validation models in any capacity, so they are not applicable when you evaluate using validation results. The null hypothesis of each test is that forecast models both provide an equally accurate fit to the measured values of the time series. If this null hypothesis is rejected, the selected method is determined to be significantly more accurate than the method that was not selected. If the null hypothesis is not rejected, both methods are determined to have equivalent accuracy. Complete details of the DM and HLN tests can be found in the Additional references section.

By performing the HLN test between the selected method and every other method, the tool generates a list of methods that are equivalently accurate to the selected method. This information is summarized in geoprocessing messages and charts.

## Best practices and limitations

When deciding whether this tool is appropriate for your data and which parameters you should choose, several things should be taken into account.

- For each location, this tool selects the forecast method that provides the smallest Validation or Forecast RMSE, and this can result in different methods being selected for locations that are near each other. For example, if your data represents the yearly population of counties, one county may use a forest-based method, and two neighboring counties may use a Gompertz curve and a seasonal exponential smoothing method. Consider whether it makes sense for different locations to use different forecast methods with very different shapes, and check if selecting the forecast method location by location really provides a notable reduction in the Forecast or Validation RMSE at the locations. If using a single method at every location is nearly as accurate as choosing a different method location by location, the principle of parsimony states that you should use a single forecast method for all locations.
- Choosing whether to evaluate using validation results has advantages and disadvantages. Performing validation to withheld time steps is the closest equivalent to forecasting unknown future values, so using validation will more frequently select the method that forecasts future values most accurately. However, the DM and HLN tests are only performed if you do not evaluate using validation results. This is because the DM and HLN tests are goodness-of-fit tests, so they only test how well the model fits the measured values at the location, so they are not applicable when you evaluate using validation results. You must decide which is most important—selecting the method that forecasts future values most accurately or testing whether the selected method provides a significantly better fit to the time series.
- Forecast methods created using the Forest-based Forecast tool usually provide the best fit to the time series of a location, but they often do not forecast future values more accurately than other methods. If any of the input forecast space-time cubes represent a forest-based method, it is recommended that you evaluate using validation results.

## Tool outputs

The primary output of this tool is a 2D feature class showing each location in the Input Space Time Cube symbolized by the final forecasted time step of the selected method. The forecasted values from the selected method at all other time steps are stored as fields. Although the method at each location is independently selected and spatial relationships are not taken into account, the map may display spatial patterns for areas with similar time series.

### Pop-up charts

Clicking any feature on the map using the Explore navigation tool displays an interactive chart in the Pop-up pane showing the fitted values, forecast values, and confidence interval (if the method supports confidence intervals) of the selected method at the location along with a vertical gray line at the start of the forecast. For all other methods, the forecast values are shown.

The selected method is highlighted in the chart's legend, and if the same method is used more than once, an index number is used to distinguish them. The following image shows a pop-up chart of two forest-based methods, a linear curve fit method, and an exponential smoothing method. The first forest-based method is the method selected at the location:

You can click any other method in the legend to display its fitted values and confidence interval (if supported). The following image shows the same chart after clicking the exponential smoothing method:

Hovering over the pop-up creates an interactive time slider (vertical cyan line) that displays all values of the chart at that time step:

##### Note:

Pop-up charts are not created when the output features are saved as a shapefile (.shp). Additionally, if any confidence intervals extend off the chart, a Show Full Data Range button appears above the chart that allows you to extend the chart to show the entire confidence interval.

### Geoprocessing messages

The tool provides a number of messages with information about the tool execution. The messages have several sections.

The Analysis Details section displays properties of the input space-time cubes, including the forecast methods of each cube, the number of forecasted time steps, the number of time steps excluded for validation, the percent of locations that were modeled with seasonality, and information about the forecasted time steps. The properties displayed in this section depend on how the cubes were originally created, so the provided information can vary.

The Summary of Forecast RMSE and Summary of Validation RMSE sections display summary statistics for the Forecast RMSE and Validation RMSE among all of the locations. For each value, the minimum, maximum, mean, median, and standard deviation are displayed. Only one of these two sections is displayed in the messages for each run of the tool. If you choose to evaluate using validation results, the summary statistics for Validation RMSE are shown. Otherwise, the summary statistics for Forecast RMSE are shown.

The Summary of Selected Forecast Methods section provides summaries of which forecast methods were most frequently selected for the locations. For each input space-time cube, the section displays the number and percent of locations where that method was selected. This allows you to quickly compare how well the various methods performed across all of the locations. If you choose to not evaluate using validation results, the section additionally displays the number and percent of locations where each method was not significantly less accurate than the selected method. The method that is selected at a location is considered to be equivalently accurate to the selected method at the location (itself), so it is included in the count and percent.

##### Note:

Geoprocessing messages appear at the bottom of the Geoprocessing pane during tool execution. You can access the messages by hovering over the progress bar, clicking the pop-out button or expanding the messages section in the Geoprocessing pane. You can also access the messages for a previously run tool using geoprocessing history.

### Fields of the output features

In addition to Object ID, geometry fields, and the field containing the pop-up charts, the Output Features has the following fields:

- Location ID (LOCATION)—The Location ID of the corresponding location of the space-time cube.
- Forecast for (Analysis Variable) in (Time Step) (FCAST_1, FCAST_2, and so on)—The forecasted value of the selected forecast method at each future time step. The field alias displays the name of the Analysis Variable and the date of the forecast. A field of this type is created for each forecasted time step.
- High Interval for (Analysis Variable) in (Time Step) (HIGH_1, HIGH_2, and so on)—The upper bound of a 90 percent confidence interval for the forecasted value of the selected forecast method at each future time step. The field alias displays the name of the Analysis Variable and the date of the forecast. A field of this type is created for each forecasted time step. If the selected forecast method at a location does not provide confidence intervals, the value in this field is null. If none of the methods provide confidence intervals, this field is not created.
- Low Interval for (Analysis Variable) in (Time Step) (LOW_1, LOW_2, and so on)—The lower bound of a 90 percent confidence interval for the forecasted value of the selected forecast method at each future time step. The field alias displays the name of the Analysis Variable and the date of the forecast. A field of this type is created for each forecasted time step. If the selected forecast method at a location does not provide confidence intervals, the value in this field is null. If none of the methods provide confidence intervals, this field is not created.
- Best Forecast Root Mean Square Error (F_RMSE)—The Forecast RMSE of the selected method at the location.
- Best Validation Root Mean Square Error (V_RMSE)—The Validation RMSE of the selected method at the location. If the Evaluate Using Validation Results parameter is unchecked, this field is not created.
- Season Length (SEASON)—The number of time steps corresponding to one season for the location. If the selected forecast method at the location does not support seasonality, the value in this field is -1.
- Time Window (TIMEWINDOW)—The time step window used at the location. If the selected forecast method at the location does not support time windows, the value in this field is -1.
- Is Seasonal (IS_SEASON)—A Boolean variable indicating whether seasonality was determined by spectral density. A value of 1 indicates seasonality was detected by spectral density, and a value of 0 indicates no seasonality was used or that the selected forecast method does not support seasonality.
- Forecast Method (METHOD)—The forecast method that was selected at the location.
- (Method name) Forecast RMSE (F_RMSE_1, F_RMSE_2, and so on)—The Forecast RMSE of each forecast method at the location. The field alias displays the name of the method. A field of this type is created for each space-time cube provided in the Input Forecast Space Time Cubes parameter. If the Evaluate Using Validation Results parameter is checked, this field is not created.
- (Method name) Validation RMSE (V_RMSE_1, V_RMSE_2, and so on)—The Validation RMSE of each forecast method at the location. The field alias displays the name of the input space-time cube. A field of this type is created for each space-time cube provided in the Input Forecast Space Time Cubes parameter. If the Evaluate Using Validation Results parameter is not checked, this field is not created.
- Equally Accurate Fit Methods (EQUAL_MTHD)—A text field listing the forecast methods that were not significantly less accurate than the selected method at the location. If more than one method was not significantly less accurate, each method is separated by a vertical line |. If more than one of the same type of method (for example, two forest-based methods using different forest parameters) are listed, the method name will contain an index number to differentiate them. If the Evaluate Using Validation Results parameter is checked, this field is not created.
- Is Optimal Method: (Method name) (OPT_(Method))—A Boolean variable indicating whether the forecast method was not significantly less accurate than the selected method at the location. The name of the forecast method appears in the field name and field alias. A value of 1 indicates that the method was not significantly less accurate than the selected method. A field of this type is created for every forecast method, and the method selected at the location always contains the value 1. If the Evaluate Using Validation Results parameter is checked, these fields are not created.

### Output space time cube

If an Output Space Time Cube is specified, the output cube contains all of the original values of the input space-time cubes with the forecasted values of the selected forecast method appended. This new space-time cube can be displayed using the Visualize Space Time Cube in 2D or Visualize Space time Cube in 3D tools and can be used as input to the tools in the Space Time Pattern Mining toolbox, such as Emerging Hot Spot Analysis and Time Series Clustering.

### Summary charts of the DM and HLN tests

If you choose not to evaluate using validation results by unchecking the Evaluate Using Validation Results parameter, the output features contain two charts summarizing the results of the DM and HLN tests.

The Forecast Methods and Fit Methods with Equivalent Accuracy chart allows you to see which forecast methods were selected most frequently and, if a different method was selected, how frequently each method was equivalently accurate to the selected method. The chart displays side-by-side bar charts for the three methods that were selected most frequently (if only two space-time cubes were provided, only two side-by-side bar charts are displayed). For each of the three methods, the chart displays a bar chart only for locations where that method was selected. Among these locations, the bar chart shows the number of locations where each method is equivalently accurate (determined by the DM or HLN tests). The highest bar always corresponds to the selected method, and this allows you to compare the relative scale. The names of the forecast methods on the x-axis are usually truncated in the chart, so you can hover over any of the bars to see the names of the methods.

The Distribution of Forecast Method Combinations with Equivalent Accuracy chart displays a bar chart for different combinations of forecast methods that were equivalently accurate. This allows you to see which methods often represented the locations equivalently well. Each bar corresponds to a particular combination of forecast methods, and the height of the bar indicates the number of locations where those methods were equivalently accurate. The names of the combinations on the x-axis are usually truncated in the chart, so you can hover over any of the bars to see the names of the forecast methods in the combination.

## Additional resources

For more information about the DM and HLN tests, see the following references:

- Harvey, D., Leybourne, S., and Newbold, P. (1998). "Tests for Forecast Encompassing." Journal of Business and Economic Statistics, 16:254-259.
- Diebold, F and Mariano, R. (1995). "Comparing Predictive Accuracy." Journal of Business and Economic Statistics, 13: 253-63.