Summary
Forecasts the future values of each location of a spacetime cube using an adaptation of Leo Breiman's random forest algorithm. The forest regression model is trained using time windows on each location of the spacetime cube.
Illustration
Usage
This tool accepts netCDF files created by the Create Space Time Cube By Aggregating Points, Create Space Time Cube From Defined Features, and Create Space Time Cube from Multidimensional Raster Layer tools.
Compared to other forecasting tools in the Time Series Forecasting toolset, this tool is the most complex but makes the fewest assumptions about the data. It is recommended for time series with complicated shapes and trends that are difficult to model with simple mathematical functions or when the assumptions of other methods are not satisfied.
Multiple forecasted spacetime cubes can be compared and merged using the Evaluate Forecasts by Location tool. This allows you to create multiple forecast cubes using different forecasting tools and parameters, and the tool will identify the best forecast for each location using either Forecast root mean square error (RMSE) or Validation RMSE.
For each location in the Input Space Time Cube, the tool builds two models that serve different purposes.
 Forecast model—This model is used to forecast future values of the spacetime cube by building a forest using the values of the time series and using this forest to forecast the values of future time steps. The fit of the forecast model to the values of the spacetime cube is measured by the Forecast RMSE value.
 Validation model—This model is used to validate the forecast model and test how accurately it can forecast future values. If a number greater than 0 is specified for the Number of Time Steps to Exclude for Validation parameter, this model is built using the time steps that were not excluded and is used to forecast the values of the time steps that were excluded. This allows you to see how well the forest can forecast future values. The fit of the forecasted values to the excluded values is measured by the Validation RMSE value.
Learn more about the forecast model, validation model, and RMSE statistics
The Output Features will be added to the Contents pane with rendering based on the final forecasted time step.

This tool creates geoprocessing messages and popup charts to help you understand and visualize the forecast results. The messages contain information about the structure of the spacetime cube and summary statistics of the RMSE values and season lengths. Clicking a feature using the Explore navigation tool displays a line chart in the Popup pane showing the values of the spacetime cube, fitted forest values, forecasted values, and confidence bounds for that location.
Deciding how many time steps to exclude for validation is an important choice. The more time steps are excluded, the fewer time steps remain to estimate the validation model. However, if too few time steps are excluded, the Validation RMSE will be estimated using a small amount of data and may be misleading. It is recommended that you exclude as many time steps as possible while still maintaining sufficient time steps to estimate the validation model. It is also suggested that you withhold at least as many time steps for validation as the number of time steps you intend to forecast, if your spacetime cube has enough time steps to allow this.
Syntax
ForestBasedForecast(in_cube, analysis_variable, output_features, {output_cube}, {number_of_time_steps_to_forecast}, {time_window}, {number_for_validation}, {number_of_trees}, {minimum_leaf_size}, {maximum_depth}, {sample_size}, {forecast_approach})
Parameter  Explanation  Data Type 
in_cube  The netCDF cube containing the variable you want to forecast to future time steps. This file must have an .nc file extension and must have been created using the Create Space Time Cube By Aggregating Points, Create Space Time Cube From Defined Locations, or Create Space Time Cube From Multidimensional Raster Layer tools.  File 
analysis_variable 
The numeric variable in the netCDF file that will be forecasted to future time steps.  String 
output_features 
The output feature class of all locations in the spacetime cube with forecasted values stored as fields. The layer displays the forecast for the final time step and contains popup charts showing the time series, forecasts, and 90 percent confidence bounds for each location.  Feature Class 
output_cube (Optional) 
A new spacetime cube (.nc file) containing the values of the input spacetime cube with the forecasted time steps appended. The Visualize Space Time Cube in 3D tool can be used to see all of the observed and forecasted values simultaneously.  File 
number_of_time_steps_to_forecast (Optional) 
A positive integer specifying the number of time steps to forecast. This value cannot be larger than 50 percent of the total time steps in the input spacetime cube. The default value is one time step.  Long 
time_window (Optional)  The number of previous time steps to use when training the forest. If your data displays seasonality (repeating cycles), provide the number of time steps corresponding to one season for this parameter. This value cannot be larger than onethird of the number of time steps in the input spacetime cube. If left empty, a time window is estimated for each location using a spectral density function.  Long 
number_for_validation (Optional)  The number of time steps at the end of each time series to exclude for validation. The default value is 10 percent (rounded down) of the number of input time steps, and this value cannot be larger than 25 percent of the number of time steps. Provide the value 0 to not exclude any time steps.  Long 
number_of_trees (Optional)  The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100, and the value must be at least 1 and not greater than 1,000.  Long 
minimum_leaf_size (Optional)  The minimum number of observations that are required to keep a leaf (the terminal node on a tree without further splits). For very large data, increasing this number will decrease the run time of the tool.  Long 
maximum_depth (Optional) 
The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. If left empty, a value will be determined by the tool based on the number of trees created by the model and the size of the time step window.  Long 
sample_size (Optional)  The percent of training data that will be used to fit the forecast model. The training data consists of associated explanatory and dependent variables constructed using time windows. All remaining training data will be used to optimize the parameters of the forecast model. The default is 100 percent.  Long 
forecast_approach (Optional)  Specifies how the explanatory and dependent variables will be represented when training the forest model at each location. To train the forest that will be used to forecast, sets of explanatory and dependent variables must be created using time windows. Use this parameter to specify whether these variables will be linearly detrended and whether the dependent variable will be represented by its raw value or by the residual of a linear regression model. This linear regression model uses all time steps within a time window as explanatory variables and uses the following time step as the dependent variable. The residual is calculated by subtracting the predicted value based on linear regression from the raw value of the dependent variable. Learn more about the Forecast Approach parameter
 String 
Code sample
The following Python script demonstrates how to use the ForestBasedForecast tool:
import arcpy
arcpy.env.workspace = "C:/Analysis"
# Forecast four time steps using a random forest with detrending.
arcpy.stpm.ForestBasedForecast("CarTheft.nc","Cars_NONE_ZEROS",
"Analysis.gdb/Forecasts", "outForecastCube.nc"
4, 3, 5, 100, "", "", 100, "VALUE_DETREND")
The following Python script demonstrates how to use the ForestBasedForecast tool to forecast counts of car theft:
# Forecast car thefts using a random forest.
# Import system modules.
import arcpy
# Set property to overwrite existing output, by default.
arcpy.env.overwriteOutput = True
# Set workspace.
workspace = r"C:\Analysis"
arcpy.env.workspace = workspace
# Forecast three time steps using a random forest based on change.
arcpy.stpm.ForestBasedForecast("CarTheft.nc","Cars_NONE_ZEROS",
"Analysis.gdb/Forecasts", "outForecastCube.nc"
4, 3, 5, 100, "", "", 100, "CHANGE")
# Create a feature class visualizing the forecasts.
arcpy.stpm.VisualizeSpaceTimeCube3D("outForecastCube.nc", "Cars_NONE_ZEROS",
"VALUE", "Analysis.gdb/ForecastsFC")
Licensing information
 Basic: Yes
 Standard: Yes
 Advanced: Yes
Related topics
 An overview of the Space Time Pattern Mining toolbox
 An overview of the Time Series Forecasting toolset
 Curve Fit Forecast
 Exponential Smoothing Forecast
 Evaluate Forecasts By Location
 How Forestbased Forecast works
 Forestbased Classification and Regression
 How Forestbased Classification and Regression works
 Find a geoprocessing tool