Summarize Center And Dispersion (GeoAnalytics)

Summary

Finds central features and directional distributions and calculates mean and median locations from the input.

Illustration

Summarize Center And Dispersion tool illustration

Usage

  • This tool can be used for centrality and dispersion of features. The following are examples of situations when using this tool is beneficial:

    • A local government is planning to open a new library for an underserved community. Centroids from block groups with the appropriate zoning and available lots have been collected. Calculating a central feature with a weight on population can be used to identify the central block group that will best serve the community.
    • A GIS analyst is analyzing the locations of 911 calls and the locations of emergency response stations (police, fire, and ambulance). A mean center result can be used to compare the mean center of the emergency calls and the mean center of the response stations to optimize response time.
    • A crime analyst wants to determine if the median center for burglaries shifts when evaluating daytime versus nighttime incidents. Calculating a median center with a group by time of day can be used to determine where crimes are occurring during the day and at night.
    • A GIS analyst for a nongovernmental organization is analyzing the spread of an infectious disease. An ellipse can be used to model the dispersion of the outbreak.

  • For input line and polygon features, feature centroids are used in distance computations.

  • The Weight Field parameter is used to weight locations according to their relative importance. For example, stores in a retail chain can be weighted by total sales, or polygon features can be weighted by their area. See Using weights to learn more about how weights are applied in analysis.

  • The Group By Field parameter is used to group features for separate computations of central features or dispersion. For example, wildlife observations taken throughout the year can be grouped by season or month. The field can be of integer, date, or string type. Records with null values will be grouped together.

  • The central feature is the feature associated with the smallest accumulated distance to all other features in the dataset. This feature is identified and included in the Central Feature Layer output. It is possible to have more than one feature sharing the smallest accumulated distance to all other features. If this happens, all of the most centrally located features are included in the Central Feature Layer output. When a Group By Field parameter value is specified, the input features are first grouped according to the field values; then a central feature is identified for each group. The geometry type of the output central feature will be the same as the input features.

  • The mean center is a point constructed from the average x- and y-coordinates. The mean center features are included in the Mean Center Layer output. When a Group By Field value is specified, the input features are first grouped according to the field values; then the mean center is calculated for each group.

  • Median center uses an iterative algorithm to find the geometric median point that minimizes Euclidean distance to all features in the dataset. The median center features are included in the Median Center Layer output. When a Group By Field value is specified, the input features are first grouped according to the field values; then the median center is calculated for each group. Unlike the results of the mean center operation, the median center results are less influenced by outlier features.

  • Standard deviational ellipses are created to summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends. The ellipses can be sized as 1, 2, or 3 standard deviations. The ellipse features are included in the Ellipse Layer output. When a Group By Field value is specified, the input features are first grouped according to the field values; then an ellipse is calculated for each group.

  • You can specify one or more summary types to output. Each summary type will be output to a unique feature layer.

  • If the input layer includes features with null values for time or geometry, those features will not be used in analysis.

  • In addition to the fields from the input layer, the output Central Feature summary type result will include the following fields:

    Field nameDescription

    CoordX

    The x-coordinate of the central feature. If the feature is a line or polygon, the value will represent the centroid of the feature.

    CoordY

    The y-coordinate of the central feature. If the feature is a line or polygon, the value will represent the centroid of the feature.

    instant_datetime

    If the input layer is time enabled with time type instant, the output result will include an instant date field representing the time of the output feature.

    start_datetime

    If the input layer is time enabled with time type interval, the output result will include a start date field representing the start time of the output feature.

    end_datetime

    If the input layer is time enabled with time type interval, the output result will include an end date field representing the end time of the output feature.

  • In addition to the optional Group By Field parameter value used in analysis, the output Mean Center and Median Center summary type results will include the following fields:

    Field nameDescription

    CoordX

    The x-coordinate of the mean or median feature.

    CoordY

    The y-coordinate of the mean or median feature.

    instant_datetime

    If the input layer is time enabled, the output result will include an instant date field representing the mean or median time of the input features. This applies to input layers of both interval and instant time types.

  • In addition to the optional Group By Field parameter value used in analysis, the output Ellipse summary type will include the following fields:

    Field nameDescription

    CenterX

    The x-coordinate for the mean center of the ellipse.

    CenterY

    The y-coordinate for the mean center of the ellipse.

    CenterT

    The time value for the mean center of the ellipse.

    Rotation

    The rotation of the long axis measured clockwise from noon. The rotation is measured in the units of the input's spatial reference. For example, a projected dataset could be measured in meters, and a geographic dataset could be measured in degrees.

    MajStdDist

    The standard distance for the major axis. The rotation is measured in the units of the input's spatial reference. For example, a dataset with a projected spatial reference could be measured in meters, and a dataset with a geographic spatial reference could be measured in degrees.

    MinStdDist

    The standard distance for the minor axis. The rotation is measured in the units of the input's spatial reference. For example, a dataset with a projected spatial reference could be measured in meters, and a dataset with a geographic spatial reference could be measured in degrees.

    TmStdDist

    The temporal standard distance. This value is a duration measured in milliseconds.

  • Coordinate value attributes, for example CoordX and CoordY, will be calculated using the spatial reference of the analysis. By default, the spatial reference of the analysis will be the same as the input layer. Optionally, you can specify the spatial reference used in the analysis using the Output Coordinate System environment variable.

    If you are writing results to the spatiotemporal data store, the result features will be represented by the WGS 1984 (WKID 4326) coordinate system. This means the geometry values of your result features may be stored in different coordinate systems than the output attribute values. For example, if you output a mean center layer to the spatiotemporal data store and specify the Output Coordinate System environment value of NAD 1983 UTM Zone 1N (WKID 26901), the calculated values for the CoordX and CoordY fields will be in NAD 1983 UTM Zone 1N (WKID 26901), but the features on the map will be in the WGS 1984 (WKID 4326) coordinate system.

  • You can improve the performance of the Summarize Center And Dispersion tool by doing one or more of the following:

    • Set the extent environment so that you only analyze data of interest.
    • Use data that is local to where the analysis is being run.
    • Group your data using the Group By Field parameter.
    • For larger datasets, use Median Center for the Generate Types parameter, as it may be the least performant summary type due to is iterative calculations.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. Therefore, it is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

Parameters

LabelExplanationData Type
Input Layer

The point, line, or polygon layer to be summarized.

Feature Set
Output Name

The name of the output feature service.

String
Generate Types

Specifies the summary types to be generated. You can use one or more summary types. A unique layer will be created for each summary type selected.

  • Central Feature —A layer will be created that contains a copy of the most central feature from the input layer.
  • Mean Center —A point layer will be created that represents the mean center of the input layer.
  • Median Center —A point layer will be created that represents the median center of the input layer.
  • Ellipse —A polygon layer will be created that represents the directional ellipse of the input layer.
String
Ellipse Size
(Optional)

Specifies the size of output ellipses in standard deviations.

  • One standard deviation —Output ellipses will cover one standard deviation of the input features. This is the default.
  • Two standard deviations —Output ellipses will cover two standard deviations of the input features.
  • Three standard deviations —Output ellipses will cover three standard deviations of the input features.
String
Weight Field
(Optional)

A numeric field used to weight locations according to their relative importance. This applies to all summary types.

Field
Group By Field
(Optional)

The field used to group similar features. This applies to all summary types. For example, if you choose a field named PlantType that contains values of tree, bush, and grass, all of the features with the value tree will be analyzed for their own center or dispersion. This example will result in three features, one for each group of tree, bush, and grass.

Field
Data Store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is Spatiotemporal big data store. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • Spatiotemporal big data store —Output will be stored in a spatiotemporal big data store. This is the default.
  • Relational data store —Output will be stored in a relational data store.
String

Derived Output

LabelExplanationData Type
Central Feature Layer

The layer containing the central feature from the input layer.

Feature Class
Mean Center Layer

The point layer containing the mean center representations of the input layer.

Feature Class
Median Center Layer

The point layer containing the median center representations of the input layer.

Feature Class
Ellipse Layer

The polygon layer containing the ellipse representations of the input layer.

Feature Class

arcpy.geoanalytics.SummarizeCenterAndDispersion(input_layer, output_name, generate_types, {ellipse_size}, {weight_field}, {group_by_field}, {data_store})
NameExplanationData Type
input_layer

The point, line, or polygon layer to be summarized.

Feature Set
output_name

The name of the output feature service.

String
generate_types
[generate_types,...]

Specifies the summary types to be generated. You can use one or more summary types. A unique layer will be created for each summary type selected.

  • CENTRAL_FEATUREA layer will be created that contains a copy of the most central feature from the input layer.
  • MEAN_CENTERA point layer will be created that represents the mean center of the input layer.
  • MEDIAN_CENTERA point layer will be created that represents the median center of the input layer.
  • ELLIPSEA polygon layer will be created that represents the directional ellipse of the input layer.
String
ellipse_size
(Optional)

Specifies the size of output ellipses in standard deviations.

  • 1_STANDARD_DEVIATIONOutput ellipses will cover one standard deviation of the input features. This is the default.
  • 2_STANDARD_DEVIATIONSOutput ellipses will cover two standard deviations of the input features.
  • 3_STANDARD_DEVIATIONSOutput ellipses will cover three standard deviations of the input features.
String
weight_field
(Optional)

A numeric field used to weight locations according to their relative importance. This applies to all summary types.

Field
group_by_field
(Optional)

The field used to group similar features. This applies to all summary types. For example, if you choose a field named PlantType that contains values of tree, bush, and grass, all of the features with the value tree will be analyzed for their own center or dispersion. This example will result in three features, one for each group of tree, bush, and grass.

Field
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String

Derived Output

NameExplanationData Type
out_central_feature_layer

The layer containing the central feature from the input layer.

Feature Class
out_mean_center_layer

The point layer containing the mean center representations of the input layer.

Feature Class
out_median_center_layer

The point layer containing the median center representations of the input layer.

Feature Class
out_ellipse_layer

The polygon layer containing the ellipse representations of the input layer.

Feature Class

Code sample

SummarizeCenterAndDispersion (stand-alone script)

The following stand-alone script demonstrates how to use the SummarizeCenterAndDispersion function.

# Name: SummarizeCenterAndDispersion.py
# Description: Calculate a standard deviational ellipse of contagious disease 
#              data to understand the spread of the disease over time. 
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
# This example calculates a standard deviational ellipse for three standard 
# deviations of the data
inFeatures = "https://sampleserver6.com/arcgis/rest/services/DataStoreCatalogs/bigDataFileShares_myBDFS/BigDataCatalogServer/diseaseRecords"
outFS = "disease_movement_ellipse"
summaryType = "ELLIPSE"
dataStore = "RELATIONAL_DATA_STORE"

# Execute SummarizeCenterAndDispersion
arcpy.geoanalytics.SummarizeCenterAndDispersion(inFeatures, outFS, summaryType, 
                                                "3_STANDARD_DEVIATIONS", "", 
                                                "", "", "", "" dataStore)

Environments

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics