Fill Missing Values (Space Time Pattern Mining)

Summary

Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, or time-series values.

Learn more about how Fill Missing Values works

Illustration

Fill Missing Values tool graphic

Usage

  • Input Features can be points or polygons.

  • The resulting output will contain three fields for each of the Fields to Fill. The first contains both the original and filled values, and the second contains an indicator that the value was estimated. The estimated field will keep its original field name, but field aliases will be created using the following naming convention: <field>_FILLED and <field>_ESTIMATED. The third field added is the number of neighbors <field>_ N_NEIGHBORS used in the calculations for each estimated value.

  • The output also contains additional fields containing values that can help you understand the number of neighbors and range of neighbor values used in the calculations for the target missing value. If the Fill Method is Average, the standard deviation (<field>_STD) of the neighbors used in calculations is reported. The maximum neighbor value is reported for Minimum and the minimum neighbor value for Maximum. If the Fill Method is Median, the mean absolute deviation of the neighbors is reported. If missing values are being filled using Temporal Trend, the field will contain the sum of squared residuals of the spline. The NNBRS field contains the count of neighbors used to calculate the estimated values.

  • You can include fields that do not contain null values. These fields will be copied to the output but will have no additional fields associated with them in the output (such as <field>_FILLED or <field>_ESTIMATED). Alternatively, you can provide a Unique ID that will be added to the output and can then be used to join your results back to your Input Feature Class.

  • The field NUM_EST (TOT_EST if you're using a related table) is the total number of estimated variables for the associated record. This field is used to render the output map.

  • This tool can be used with panel data stored either as repeated shapes or with a related table. If a Location ID is specified, the tool will recognize that the input is panel data and a Time Field will be required.

  • The Location ID is an integer field and should represent a unique and stationary location. It should not have different X,Y coordinates over time.

  • If Fixed distance, Contiguity edges only, or Contiguity edges corners is chosen as the Conceptualization of Spatial Relationships, a space time window can be simulated by choosing a Distance Band and a Temporal Neighborhood.

  • If Fixed distance, Contiguity edges only, or Contiguity edges corners is chosen as the Conceptualization of Spatial Relationships, a Number of Spatial Neighbors can be set to specify a minimum number of neighbors.

  • The option to fill missing values according to Temporal Trend is only available if a Location ID is specified. A Time Field is also required.

  • When using Temporal Trend to fill values, the null value being filled must have at least two time periods with values before it and at least two time periods with values after it in order to be filled. Because of this requirement, nulls existing in the first two or last two time steps will always be unable to fill using Temporal Trend.

  • The Temporal Trend Fill Method uses the Interpolated Univariate Spline method in the SciPy Interpolation package.

  • Missing values that were unable to be estimated and filled will be reported in the output in the format in which the nulls existed originally.

  • When filling the missing values of panel data with spatial neighbors only, the Temporal Neighborhood should be set to 0.

  • If your data is panel data, the Temporal Neighborhood parameter can be used as a way to filter by time. Alternatively, a Temporal Neighborhood of 0 allows you to look at spatial neighbors only.

  • It is important to inspect the resulting filled values to make sure they make sense for your analysis. For instance, if your original field was an integer and the tool was set to fill with the average of spatial neighbors, you will end up with decimals in the results, which may not make sense if your input field was a count. Additionally, due to the method employed when using Temporal Trend, it may be possible to get a negative number as a result even if none of your existing values were negative. This would not make sense if the field you were filling was population.

  • The field N_NEIGHBORS reports the number of neighbors included in the calculations for that feature. If your Fill Method is Temporal Trend, this number is the number of values that exist in your time series for that Location ID (for instance, if you were only missing one value in your time series, it would report the number of time steps in your dataset minus 1). If you are using a Conceptualization of Spatial Relationships of K nearest neighbors and also a Temporal Neighborhood, the number of neighbors reported will include the k neighbors for the feature that fall within the time window specified.

  • Messages describing details of the analysis and characteristics of your filled fields are written at the bottom of the Geoprocessing pane during tool execution. You can access the messages by hovering over the progress bar, clicking the pop-out button Pop-out, or expanding the messages section in the Geoprocessing pane. You can also access the messages for a previous run of the Fill Missing Values tool via Geoprocessing History.

Syntax

arcpy.stpm.FillMissingValues(in_features, out_features, fields_to_fill, fill_method, {conceptualization_of_spatial_relationships}, {distance_band}, {temporal_neighborhood}, {time_field}, {number_of_spatial_neighbors}, {location_id}, {related_table}, {related_location_id}, {weights_matrix_file}, {unique_id}, {null_value}, {out_table})
ParameterExplanationData Type
in_features

The feature class containing the null values to be filled.

Feature Layer
out_features

The output that will include the filled (estimated) values.

If related_table is specified, out_features will contain the number of estimated values at each location, and out_table will contain the filled (estimated) values.

Feature Class
fields_to_fill
[fields_to_fill,...]

The numeric fields containing missing data (null values).

Field
fill_method

The type of calculation that will be applied. TEMPORAL_TREND is only available if location_id and time_field are specified.

  • AVERAGEReplaces null values with the average (mean) value of the feature's neighbors.
  • MINIMUMReplaces null values with the minimum (smallest) value of the feature's neighbors.
  • MAXIMUMReplaces null values with the maximum (largest) value of the feature's neighbors.
  • MEDIANReplaces null values with the median (sorted middle value) of the feature's neighbors.
  • TEMPORAL_TRENDReplaces null values based on the trend at that unique location.
String
conceptualization_of_spatial_relationships
(Optional)

Specifies how spatial relationships among features are defined.

  • FIXED_DISTANCENeighboring features within a specified critical distance (distance_band) of each feature are included in the calculations; everything outside the critical distance is excluded.
  • K_NEAREST_NEIGHBORS The closest k features are included in the calculations; k is a specified numeric parameter.
  • CONTIGUITY_EDGES_ONLY Only neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • CONTIGUITY_EDGES_CORNERS Polygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • GET_SPATIAL_WEIGHTS_FROM_FILESpatial relationships are defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights_Matrix_File parameter.
String
distance_band
(Optional)

Specifies a cutoff distance for the FIXED_DISTANCE option. Features outside the specified cutoff for a target feature are ignored in calculations for that feature. This parameter is not available for CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS.

Linear Unit
temporal_neighborhood
(Optional)

Specifies an interval forward and backward in time to determine which features will be used in calculations for the target feature. Features that are not within this interval of the target feature are ignored in calculations for that feature.

Time Unit
time_field
(Optional)

The field containing the time stamp for each record in the dataset. This field must be of type Date.

This is required if location_id has been provided.

Field
number_of_spatial_neighbors
(Optional)

The number of nearest neighbors to be included in calculations.

If FIXED_DISTANCE, CONTIGUITY_EDGES_ONLY, or CONTIGUITY_EDGES_CORNERS is chosen, this number is the minimum number of neighbors to include in calculations.

Long
location_id
(Optional)

An integer field containing a unique ID number for each location.

location_id is used to match features from the in_features to rows in the related_table or to specify a unique location ID for determining temporal neighbors.

Field
related_table
(Optional)

The table or table view containing the temporal data for each of the in_features.

Table View
related_location_id
(Optional)

An integer field in the related_table that contains the location_id on which the relate will be based.

Field
weights_matrix_file
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
unique_id
(Optional)

An integer field containing a different value for every record in the in_features. This field can be used to join your results back to your original dataset.

If you don't have a unique_id field, you can create one by adding an integer field to your feature class table and calculating the field values equal to the FID or OBJECTID field.

Field
null_value
(Optional)

The value that represents null (missing) values. If no value is specified, <Null> is assumed for geodatabase feature classes. For shapefile input, a numeric value of the null placeholder is required.

Double
out_table
(Optional)

The output table containing the filled (estimated) values.

The output table is required if a related table is entered.

Table

Code sample

FillMissingValues example 1 (Python window)

The following Python window script demonstrates how to use the FillMissingValues tool.

import arcpy
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"
arcpy.FillMissingValues_stpm("Chicago_Data", "Chicago_Filled", "COUNT", "AVERAGE",
                             "K_NEAREST_NEIGHBORS", "", "", "", 8)
FillMissingValues example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the FillMissingValues tool.

# Fill missing values using a feature set and related table
# Use the results to create a space-time cube from defined locations
# Run Emerging Hot Spot Analysis on the data
# Visualize the results in 3d

#Import system modules
import arcpy

# Set geoprocessor object property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables ...
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"

try:

    # Fill missing values in a feature class containing block group polygon shapes and a related table containing the incidents
    # Since some of the values are missing we will fill them using the temporal trend method.
    arcpy.FillMissingValues_stpm("Chicago_Feature", "Chicago_FilledFeature", "COUNT", "TEMPORAL_TREND", "", "", NoneNone,
                                 "TIME", "", "MYID", "Chicago_Table", "MYID", "", "", "", "Chicago_FilledTable")

    # Create a defined location space time cube using a related table
    # Using a reference time at the start of the month to force binning fall on month breaks
    # Using temporal aggregation to sum multiple entries into one month
    # Using the method drop location if missing values since we already filled using Fill Missing Values
    arcpy.CreateSpaceTimeCubeDefinedLocations_stpm("Chicago_FilledFeature", r"C:\STPM\Chicago_Cube.nc", "MYID",
                                                   "APPLY_TEMPORAL_AGGREGATION", "TIME", "1 Months", "REFERENCE_TIME",
                                                   "10/1/2015", "", "COUNT SUM DROP_LOCATIONS", "Chicago_FilledTable",
                                                   "MYID")

    # Run an emerging hot spot analysis on the defined locations cube
    # Using contiguity edges so only block groups which bound each other are considered neighbours
    arcpy.EmergingHotSpotAnalysis_stpm(r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE",
                                       "Chicago_Cube_EmergingHotSpot", "", 1, "",
                                       "CONTIGUITY_EDGES_ONLY")

    # Use Visualize Cube in 3d to see the hot spot results for each time slice
    arcpy.VisualizeSpaceTimeCube3D_stpm(r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE", "HOT_AND_COLD_SPOT_RESULTS",
                                        "Chicago_Cube_Visualize3d")

except arcpy.ExecuteError:
    # If any error occurred when running the tool, print the messages
    print(arcpy.GetMessages())

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics