A spatial statistics model (.ssm) file is a file that contains the trained model, properties, and model diagnostics of an analysis performed by several tools in the Modeling Spatial Relationships toolset. You can use a spatial statistics model file to make predictions using new datasets and securely share it with others who can use it with their data. For example, a wildlife ecologist can collect field data of known locations of an endangered species and build a model to predict other likely locations of the species within their study area. They can then share the .ssm file with other ecologists who can use it predict locations of the species within their own study areas. Because the data used to train the model is not stored in the .ssm file, the highly sensitive original locations of the endangered species will not be revealed by sharing the model file.
You can use the following tools to manage and predict with .ssm files:
- Set Spatial Statistics Model File Properties—Set the properties of an .ssm file, including variable descriptions and units. This allows you to explain the variables of the model and their units so that others can use the file appropriately. For example, you can specify that an explanatory distance feature represents distances to hospitals measured in U.S. miles so that others can know to only use the model for a particular type of data.
- Describe Spatial Statistics Model File—View the properties of an .ssm file, including the analysis method, dataset names, properties, and model diagnostics. This allows you to understand what each variable means so that you can correctly match all variables, datasets, and units when using the file to make predictions with new data.
- Predict Using Spatial Statistics Model File—Use the .ssm file to make predictions with new datasets. You must match each variable or dataset in the .ssm file with a new dataset that has the same type and unit. For example, an explanatory variable in the model file may require a raster of temperatures values measured in degrees Celsius.
You can create an .ssm file using the Output Trained Model File parameter in the following tools:
- Generalized Linear Regression
- Forest-based and Boosted Classification and Regression
- Presence-only Prediction (MaxEnt)
Example applications
The following scenarios describe analytical workflows in which an .ssm file may be useful.
Scenario 1: Reuse the model to reduce model training time
If you perform analytical modeling with large datasets, the training process can be very time-consuming and require expensive computer hardware. In many cases, you will also need to train the model multiple times to fine-tune the settings. After choosing model settings that yield the best results, you do not want to repeat this training process for every future dataset that you will use to make predictions. Creating an .ssm file with the initial training results will allow you to reuse it with all future datasets without needing to train the model again. Using the same training model also ensures consistency of predictions for all future datasets by using the same underlying prediction model.
Scenario 2: Share a trained model files with others
You can share the .ssm file with others who want to use the model with their own data. Because the data used to create the model is not directly accessible from the model file, you can share it without revealing sensitive data that was used to train it. Before sharing the model, you can use the Set Spatial Statistics Model Properties tool to add variable descriptions and variable units. This will be useful for others so that they know which types of data and which units to use when making predictions with the file. After receiving the model file, the recipient can view properties and model diagnostics with the Describe Spatial Statistics Model File tool and then make predictions with their data using the Predict Using Spatial Statistics Model File tool.
Scenario 3: Automate analysis of streaming data services
When working with data that updates regularly, such as a streaming data service of wildfire locations, using an .ssm file allows for simple automation as new data becomes available. Each time the data is updated, you can quickly reuse the .ssm file in the Predict Using Spatial Statistics Model File tool with the updated data.
Contents of an .ssm file
The model file stores comprehensive information about models. In addition to the variable descriptions and units that are created by the Set Spatial Statistics Model Properties tool, .ssm files also contain model diagnostics to investigate the accuracy and reliability of the model.
ArcGIS Pro 3.2 and later versions allow training and predicting using data with 64-bit ObjectID and Big Integer field types.
For the Generalized Linear Regression tool, the .ssm file includes the regression coefficients and diagnostics such as AICc, R2, Adjusted R2, Joint F-Statistics, and Joint Wald Statistics. See the Interpreting message diagnostics section for a complete list and descriptions of the model diagnostics.
For the Forest-based and Boosted Classification and Regression tool, the .ssm file includes decision trees, characteristics of the model, validation diagnostics, top variable importance, and explanatory variable range diagnostics. Model Out Of Bag (OOB) errors are not included because this diagnostic is not relevant for making new predictions and would significantly increase the file size of the .ssm file. Model files created using the gradient boosted model type are supported in ArcGIS Pro 3.2 and later versions.
See the Output message and diagnostics section for more information.
For the Presence-only Prediction (MaxEnt) tool, the .ssm file includes important information about the trained model, model characteristics and summary, regression coefficients, categorical summary (if any explanatory variables are categorical), and explanatory variable range diagnostics for training data. Cross validation results and counts of presence and background points are not included because they can potentially be used to reverse-engineer sensitive data used to train the model, such as the locations of an endangered species. See the Geoprocessing messages section for more information.
Best practices
The following considerations should be made when creating and using .ssm files:
- To make the model more transparent and meaningful for sharing, you use the Set Spatial Statistics Model Properties tool to specify the description and unit for every variable. Documentation of the variables and their usage is important for scientific accuracy and reproducibility.
- Although .ssm files do not directly package the training data (only the training results) and do not store the most sensitive model diagnostics, data privacy and security is still a potential concern. Some complex model diagnostics such as the confusion matrix can potentially be used to reverse-engineer some of the original training data.
- When using an .ssm file created by others, you should investigate the properties using the Describe Spatial Statistics Model File tool. The variable descriptions and units are especially important, and you may need to manually convert the data to the units assumed by the model before using it for predictions. For example, you may need to convert temperature values from degrees Fahrenheit to degrees Celsius for the predictions to be accurate.
HDF5 data model
The .ssm file uses the Hierarchical Data Format version 5 (HDF5) data model to store the model results and metadata. HDF5 has the following benefits:
- HDF5 stores large data in an organized structure that can be highly compressed. For example, it can store a forest-based regression model trained using 600,000 features and 10,000 trees in a file that is under 20 GB. A less efficient data model would struggle to store such a complex model result in a conventional file that can be conveniently shared.
- HDF5 is a self-describing data model, meaning that you can attach metadata directly to the datasets rather than having to separate the data and metadata into different files. This synchronization allows HDF5 data to be transparent and accessible without needing to manage multiple files that must be kept together.
- HDF5 allows high performance reading and writing of data. For example, choosing to create an .ssm file when using a Spatial Statistics tool will not increase the run time of the tool by a noticeable amount. When using the model to make predictions with new data, the model can be quickly accessed to minimize overhead.
In addition to the Set Spatial Statistics Model File Properties, Describe Spatial Statistics Model File, and Predict Using Spatial Statistics Model File tools, you can also inspect .ssm files using standard HDF5 libraries.
The following Python code example shows how to inspect and print the properties of an .ssm file using the h5py package:
# Import necessary packages
import numpy as np
import h5py
spatialStatsModel = h5py.File(r'C:/MyData/MySSMFile.ssm', 'r')
# Get a list of keys of the variables:
ls = list(spatialStatsModel.keys())
# Get the attributes of the model:
attrs = list(spatialStatsModel.attrs)
# Print all the datasets and attributes
print("The variables in the model:")
for k in ls:
print("{}---{}, --- {}".format(k, spatialStatsModel[k][()],
type(spatialStatsModel[k][()])))
print("The attributes in the model:")
for k in attrs:
print("{}---{}, --- {}".format(k, spatialStatsModel.attrs.get(k),
type(spatialStatsModel.attrs.get(k))))
# Close the .ssm file
spatialStatsModel.close