Fill Missing Values (Space Time Pattern Mining)

Summary

Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, time-series, or global statistic values.

Learn more about how Fill Missing Values works

Illustration

Fill Missing Values tool illustration

Usage

  • The Input Features or Table parameter value can be point or polygon features or a stand-alone table.

  • For an input feature, the missing values can be estimated using spatial neighbors, space-time neighbors, or time-series values. The missing values can be in the input features or in a related table. For stand-alone tables, the missing values can be estimated using global statistics of the input field or time-series values. Because stand-alone tables do not have spatial information, spatial neighbors cannot be defined for tables.

  • The output will contain three fields for each field of the Fields to Fill parameter. The first will contain both the original and filled values, and the second will contain an indicator that the value was estimated. The estimated field will keep its original field name, but field aliases will be created using the following naming convention: <field>_FILLED and <field>_ESTIMATED. For input features, the third field is the number of neighbors field, <field>_ N_NEIGHBORS, used in the calculations for each estimated value. For stand-alone input tables, the third field is the number of records field, <field>_NUM_REC_USED, used in the calculations for each estimated value.

  • For input features, the output will also include fields containing values that can help you understand the number of neighbors and range of neighbors values used in the calculations for the target missing value. If the Fill Method parameter is set to Average, the standard deviation (<field>_STD) of the neighbors used in calculations will be reported. The maximum neighbor value (<field>_MAX) will be reported for the Maximum option and the minimum neighbor value (<field>_MIN) for the Minimum option. If the Fill Method parameter is set to Median, the mean absolute deviation (<field>_MAD) of the neighbors will be reported. If missing values are filled using the Temporal Trend option, the <field>_RES field will contain the sum of squared residuals of the spline. The NNBRS field will contain the count of neighbors used to calculate the estimated values.

  • For stand-alone tables, the output will include fields containing values that can help you understand the statistics and range of the nonnull values of the field used in the calculations for the target missing value. If the Fill Method parameter is set to Average, the standard deviation (<field>_STD) of all nonnull values in the field used in calculations will be reported. The input field’s maximum value (<field>_MAX) will be reported for the Maximum option and the minimum value (<field>_MIN) for the Minimum option. If the Fill Method parameter is set to Median, the mean absolute deviation (<field>_MAD) of the nonnull values will be reported. If missing values are filled using the Temporal Trend option, the <field>_RES field will contain the sum of squared residuals of the spline.

  • You can include fields that do not contain null values. These fields will be copied to the output but will have no additional fields associated with them in the output (such as <field>_FILLED or <field>_ESTIMATED). Alternatively, you can provide a value for the Unique ID parameter that will be added to the output and can be used to join the results back to the input features or table.

  • The NUM_EST field (TOT_EST if you're using a related table) is the total number of estimated variables for the associated record. This field is used to render the output map.

  • You can append the additional fields to the input feature or table using the Append Fields To Input parameter. If you append these fields, a related table cannot be provided.

  • For input features, the Location ID parameter can be used in different ways depending on the structure of the input space-time data.

    • If the data is stored in a related table (that is, you have a set of features in a feature class with a related table containing attributes over time) and you want to fill the missing values in the related table, you can use the Related Table parameter. The Location ID parameter value matches each feature in the input feature class to a set of records in the related table and must be unique for every input feature.
    • If the data is stored in the same feature class (that is, by repeating shapes or geometry), the Location ID parameter will specify each unique location in the feature class. For example, if you have the U.S. county level population data for 10 years, each county will be repeated 10 times in the feature class, and the county ID will be used to specify each unique county location. The location ID must be unique to every location but not necessarily unique to every feature.

  • The Location ID value is an integer or text field and should represent a unique and stationary location. It should not have different x,y coordinates over time.

  • This tool can be used with panel data that is stored either as repeated shapes or with a related table. If a value is provided for the Location ID parameter, the tool will recognize that the input is panel data and the Time Field parameter will be required.

  • For stand-alone tables, if a value is provided for the Location ID parameter and no value is provided for the Time Field parameter, the estimated values will be calculated using the records with the same location ID as the location with the null value being filled. For example, if you have U.S. county-level data and you want to fill the missing values using the average of all the counties in the same state, you can use a field representing the state as the location ID.

  • If both the Location ID and Time Field parameters values are input stand-alone tables, only the Temporal Trend option for the Fill Method parameter will be available

  • If the Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen for the Conceptualization of Spatial Relationships parameter, a space-time window can be simulated by choosing a value for the Distance Band and Temporal Neighborhood parameters.

  • If the Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen for the Conceptualization of Spatial Relationships parameter, a Number of Spatial Neighbors parameter value can be set to specify a minimum number of neighbors.

  • The Temporal Trend option for the Fill Method parameter is only available if values have been set for the Location ID and Time Field parameters.

  • When using the Temporal Trend option to fill values, the location with a null value being filled must have at least two time periods with values in the beginning and at least two time periods with values at the end of the time series to be filled. Because of this requirement, nulls existing in the first two or last two time steps cannot be filled using the Temporal Trend option.

  • The Temporal Trend option uses the Interpolated Univariate Spline method in the SciPy Interpolation package.

  • When filling the missing values of panel data with spatial neighbors, set the Temporal Neighborhood parameter to 0.

  • If the data is panel data, you can use the Temporal Neighborhood parameter to filter by time. Alternatively, a Temporal Neighborhood value of 0 allows you to look at spatial neighbors only.

  • It is important to inspect the resulting filled values to ensure that they make sense for your analysis. For example, if your original field was an integer and you set the tool to fill with the average of spatial neighbors, decimals will be included in the results, which may not make sense if your input field was a count. Additionally, depending on the method used for the Temporal Trend parameter, the result may be a negative number even if none of your existing values were negative. This would not make sense if the field you were filling was population.

  • The N_NEIGHBORS field reports the number of neighbors included in the calculations for that feature. If the Fill Method parameter is set to Temporal Trend, this number is the number of values that exist in the time series for that Location ID value (for example, if you were only missing one value in the time series, it would report the number of time steps in your dataset minus one). If you are using the Conceptualization of Spatial Relationships parameter value of K nearest neighbors and a Temporal Neighborhood value, the number of neighbors reported will include the k neighbors for the feature that fall within the time window specified.

  • The Null Value parameter represents the null (missing) values. This parameter is used in different ways depending on the input and output formats.

    • For geodatabase feature classes or tables, <Null> is assumed to be the null (missing value) if no value is provided for the Null Value parameter. If a value is provided, that value and the <Null> values will be estimated in the tool output.
    • For shapefiles and dBASE tables, the Null Value parameter is required. You must provide a value that represents null or missing values in the input data (for example, -9999).
    • If the input is a file geodatabase feature class or table and the output is a shapefile or dBASE table, this parameter is required to specify how the missing vales that cannot be estimated will be represented in the tool output.

  • Missing values that cannot be estimated and filled will be reported in the output in the format in which the nulls originally existed or as specified in the Null Value parameter.

  • Messages describing details of the analysis and characteristics of the filled fields are written at the bottom of the Geoprocessing pane during tool execution. To access the messages, hover over the progress bar and click the pop-out button Pop-out, or expand the messages section in the Geoprocessing pane. You can also access the messages for a previous run of the Fill Missing Values tool through the geoprocessing history.

Parameters

LabelExplanationData Type
Input Features or Table

The point or polygon feature class or stand-alone table containing the null values to be filled.

If the Related Table parameter value is provided, the null values to be filled will be contained in the related table. The input features will be matched to the rows in the related table to specify the space-time neighborhood.

Feature Layer; Table View
Output Features or Table
(Optional)

The output features or stand-alone table that will include the filled (estimated) values.

If the Related Table parameter value is provided, the output of this parameter will contain the number of estimated values at each location, and the Output Table parameter value will contain the filled (estimated) values.

Feature Class; Table
Fields to Fill

The numeric fields containing missing data (null values).

Field
Fill Method

Specifies the type of calculation that will be applied. The Temporal Trend option is only available if the Location ID and Time Field parameter values are provided.

  • AverageNull values will be replaced with the mean (average) value of the feature's neighbors or the mean value of the field to be filled for stand-alone tables.
  • MinimumNull values will be replaced with the minimum (smallest) value of the feature's neighbors or the minimum value of the field to be filled for stand-alone tables.
  • MaximumNull values will be replaced with the maximum (largest) value of the feature's neighbors or the maximum value of the field to be filled for stand-alone tables.
  • MedianNull values will be replaced with the median (sorted middle value) of the feature's neighbors or the median of the field to be filled for stand-alone tables.
  • Temporal TrendNull values will be replaced based on the trend at that unique location.
String
Conceptualization of Spatial Relationships
(Optional)

Specifies how spatial relationships among features will be defined.

  • Fixed distanceNeighboring features within a specified critical distance (the Distance Band parameter value) of each feature will be included in the calculations; everything outside the critical distance will be excluded.
  • K nearest neighbors The closest k features will be included in the calculations; k is a specified numeric parameter.
  • Contiguity edges only Only neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • Contiguity edges corners Polygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • Get spatial weights from fileSpatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights Matrix File parameter.
String
Distance Band
(Optional)

The cutoff distance for the Conceptualization of Spatial Relationships parameter's Fixed distance option. Features outside the specified cutoff for a target feature will be ignored in calculations for that feature. This parameter is not available for the Contiguity edges only or Contiguity edges corners options.

Linear Unit
Temporal Neighborhood
(Optional)

An interval forward and backward in time that determines the features that will be used in calculations for the target feature. Features that are not within this interval of the target feature will be ignored in calculations for that feature.

Time Unit
Time Field
(Optional)

The field containing the time stamp for each record in the dataset. This field must be of type Date.

For feature input, the time field will define temporal neighbors while filling missing values. A value must be provided if a related table is provided.

For feature and table input, the time field will be used when filling missing values using temporal trend at the location.

Field
Number of Spatial Neighbors
(Optional)

The number of nearest neighbors that will be included in calculations.

If the Conceptualization of Spatial Relationships parameter's Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen, this number is the minimum number of neighbors to include in calculations.

Long
Location ID
(Optional)

An integer or text field containing a unique ID for each location.

If a related table is provided, this field is used to match each input feature to rows in the related table; the values of this field must be unique for every input feature. If a related table is not provided, this field is used to specify each unique location in the input features to determine temporal neighbors. In this case, the values of this field must be unique to every location but do not need to be unique for each feature (because more than one feature can have the same location).

Field
Related Table
(Optional)

The table or table view containing the temporal data for each feature of the Input Features or Table parameter.

Table View
Related Location ID
(Optional)

An integer or text field in the Related Table parameter that contains the Location ID parameter value on which the relate will be based.

Field
Spatial Weights Matrix File
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
Unique ID
(Optional)

An integer field containing a different value for every record in the Input Features or Table parameter value. This field can be used to join the results back to the original dataset.

If you don't have a Unique ID field, you can create one by adding an integer field to the input feature's attribute table and calculating the field values equal to the FID or OBJECTID field.

Field
Null Value
(Optional)

The value that represents null (missing) values. If no value is provided, <Null> is assumed for geodatabase feature classes and tables. If a value is provided, both the value and all <Null> values will be filled. If the input or output is a shapefile or dBASE table, a numeric value of the null placeholder is required.

Double
Output Table
(Optional)

The output table containing the filled (estimated) values.

The output table is required if a related table is provided.

Table
Append Fields to Input Data
(Optional)

Specifies whether the filled value fields will be appended to the input data or an output feature class or table will be created with the filled value fields. If you append the fields, you cannot provide a related table and the output coordinate system environment will be ignored.

  • Checked—The fields containing the filled values will be appended to the input data. This option modifies the input data.
  • Unchecked—An output feature class or table will be created containing the filled value fields. This is the default.

Boolean

Derived Output

LabelExplanationData Type
Updated Input Features or Table

The updated input features or table containing the filled value fields.

Feature Layer, Table View

arcpy.stpm.FillMissingValues(in_features, {out_features}, fields_to_fill, fill_method, {conceptualization_of_spatial_relationships}, {distance_band}, {temporal_neighborhood}, {time_field}, {number_of_spatial_neighbors}, {location_id}, {related_table}, {related_location_id}, {weights_matrix_file}, {unique_id}, {null_value}, {out_table}, {append_to_input})
NameExplanationData Type
in_features

The point or polygon feature class or stand-alone table containing the null values to be filled.

If the related_table parameter value is provided, the null values to be filled will be contained in the related table. The input features will be matched to the rows in the related table to specify the space-time neighborhood.

Feature Layer; Table View
out_features
(Optional)

The output features or stand-alone table that will include the filled (estimated) values.

If the related_table parameter value is provided, the output of this parameter will contain the number of estimated values at each location, and the out_table parameter value will contain the filled (estimated) values.

Feature Class; Table
fields_to_fill
[fields_to_fill,...]

The numeric fields containing missing data (null values).

Field
fill_method

Specifies the type of calculation that will be applied. The TEMPORAL_TREND option is only available if the location_id and time_field parameter values are provided.

  • AVERAGENull values will be replaced with the average (mean) value of the feature's neighbors.
  • MINIMUMNull values will be replaced with the minimum (smallest) value of the feature's neighbors.
  • MAXIMUMNull values will be replaced with the maximum (largest) value of the feature's neighbors.
  • MEDIANNull values will be replaced with the median (sorted middle value) of the feature's neighbors.
  • TEMPORAL_TRENDNull values will be replaced based on the trend at that unique location.
String
conceptualization_of_spatial_relationships
(Optional)

Specifies how spatial relationships among features will be defined.

  • FIXED_DISTANCENeighboring features within a specified critical distance (the distance_band parameter value) of each feature will be included in the calculations; everything outside the critical distance will be excluded.
  • K_NEAREST_NEIGHBORS The closest k features will be included in the calculations; k is a specified numeric parameter.
  • CONTIGUITY_EDGES_ONLY Only neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • CONTIGUITY_EDGES_CORNERS Polygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • GET_SPATIAL_WEIGHTS_FROM_FILESpatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights_Matrix_File parameter.
String
distance_band
(Optional)

The cutoff distance for the conceptualization_of_spatial_relationships parameter's FIXED_DISTANCE option. Features outside the specified cutoff for a target feature will be ignored in calculations for that feature. This parameter is not available for the CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS options.

Linear Unit
temporal_neighborhood
(Optional)

An interval forward and backward in time that determines the features that will be used in calculations for the target feature. Features that are not within this interval of the target feature will be ignored in calculations for that feature.

Time Unit
time_field
(Optional)

The field containing the time stamp for each record in the dataset. This field must be of type Date.

For feature input, the time field will define temporal neighbors while filling missing values. A value must be provided if a related table is provided.

For feature and table input, the time field will be used when filling missing values using temporal trend at the location.

Field
number_of_spatial_neighbors
(Optional)

The number of nearest neighbors that will be included in calculations.

If the conceptualization_of_spatial_relationships parameter's FIXED_DISTANCE, CONTIGUITY_EDGES_ONLY, or CONTIGUITY_EDGES_CORNERS option is chosen, this number is the minimum number of neighbors to include in calculations.

Long
location_id
(Optional)

An integer or text field containing a unique ID for each location.

If a related table is provided, this field is used to match each input feature to rows in the related table; the values of this field must be unique for every input feature. If a related table is not provided, this field is used to specify each unique location in the input features to determine temporal neighbors. In this case, the values of this field must be unique to every location but do not need to be unique for each feature (because more than one feature can have the same location).

Field
related_table
(Optional)

The table or table view containing the temporal data for each feature of the in_features parameter.

Table View
related_location_id
(Optional)

An integer or text field in the related_table parameter that contains the location_id parameter value on which the relate will be based.

Field
weights_matrix_file
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
unique_id
(Optional)

An integer field containing a different value for every record in the in_features parameter value. This field can be used to join the results back to the original dataset.

If you don't have a unique_id field, you can create one by adding an integer field to the feature class table and calculating the field values equal to the FID or OBJECTID field.

Field
null_value
(Optional)

The value that represents null (missing) values. If no value is provided, <Null> is assumed for geodatabase feature classes and tables. If a value is provided, both the value and all <Null> values will be filled. If the input or output is a shapefile or dBASE table, a numeric value of the null placeholder is required.

Double
out_table
(Optional)

The output table containing the filled (estimated) values.

The output table is required if a related table is provided.

Table
append_to_input
(Optional)

Specifies whether the filled value fields will be appended to the input data or an output feature class or table will be created with the filled value fields. If you append the fields, you cannot provide a related table and the output coordinate system environment will be ignored.

  • APPEND_TO_INPUTThe fields containing the filled values will be appended to the input data. This option modifies the input data.
  • NEW_FEATURESAn output feature class or table will be created containing the filled value fields. This is the default.
Boolean

Derived Output

NameExplanationData Type
updated_features

The updated input features or table containing the filled value fields.

Feature Layer, Table View

Code sample

FillMissingValues example 1 (Python window)

The following Python window script demonstrates how to use the FillMissingValues function.

import arcpy
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"
arcpy.stpm.FillMissingValues("Chicago_Data", "Chicago_Filled", "COUNT", "AVERAGE",
                             "K_NEAREST_NEIGHBORS", "", "", "", 8)
FillMissingValues example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the FillMissingValues function.

# Fill missing values using a feature set and related table
# Use the results to create a space-time cube from defined locations
# Run Emerging Hot Spot Analysis on the data
# Visualize the results in 3d

# Import system modules
import arcpy

# Set overwriteOutput property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables ...
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"

try:
    # Fill missing values in a feature class containing block group polygon 
    # shapes and a related table containing the incidents. Since some of the 
    # values are missing, you will fill them using the temporal trend method.
    arcpy.stpm.FillMissingValues(
            "Chicago_Feature", "Chicago_FilledFeature", "COUNT", 
            "TEMPORAL_TREND", "", "", NoneNone, "TIME", "", "MYID", 
            "Chicago_Table", "MYID", "", "", "", "Chicago_FilledTable")

    # Create a defined location space-time cube using a related table. Using a 
    # reference time at the start of the month to force binning fall on month 
    # breaks. Using temporal aggregation to sum multiple entries into one month.
    # Using the method drop location if missing values since you already filled 
    # using Fill Missing Values.
    arcpy.stpm.CreateSpaceTimeCubeDefinedLocations(
            "Chicago_FilledFeature", r"C:\STPM\Chicago_Cube.nc", "MYID",
            "APPLY_TEMPORAL_AGGREGATION", "TIME", "1 Months", "REFERENCE_TIME", 
            "10/1/2015", "", "COUNT SUM DROP_LOCATIONS", "Chicago_FilledTable",
            "MYID")

    # Run an emerging hot spot analysis on the defined locations cube. Using 
    # contiguity edges so only block groups that bound each other are considered 
    # neighbors.
    arcpy.stpm.EmergingHotSpotAnalysis(
            r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE", 
            "Chicago_Cube_EmergingHotSpot", "", 1, "", "CONTIGUITY_EDGES_ONLY")

    # Use Visualize Cube in 3d to see the hot spot results for each time slice
    arcpy.stpm.VisualizeSpaceTimeCube3D(
            r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE", 
            "HOT_AND_COLD_SPOT_RESULTS", "Chicago_Cube_Visualize3d")

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the messages
    print(arcpy.GetMessages())

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics