Find Point Clusters (GeoAnalytics)

Summary

Finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.

Learn more about how Density-based Clustering works

Illustration

Density-based Clustering Diagram

Usage

  • This geoprocessing tool is available with ArcGIS Enterprise 10.6.1 or later.

  • The input for Find Point Clusters is a point layer. This tool extracts clusters from the Input Point Layer and identifies any surrounding noise.

  • Find Point Clusters requires that Input Point Layer is projected or the output coordinate system is set to a projected coordinate system.

  • There are two Clustering Method parameter options. Defined distance (DBSCAN) uses the DBSCAN algorithm and finds clusters of points that are in close proximity based on a specified search distance. Self-adjusting (HDBSCAN) uses the HDBSCAN algorithm (available with ArcGIS Enterprise 10.7 and later) and finds clusters of points, similar to DBSCAN, using varying distances, allowing for clusters with varying densities based on cluster probability (or stability). If DBSCAN is chosen, clusters can be found in either two-dimensional space only or both in space and time. If you check Use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points that are in close proximity based on a specified search distance and search duration (supported with ArcGIS Enterprise 10.8 and later).

  • The Minimum Features per Cluster parameter is used differently, depending on the clustering method:

    • Defined distance (DBSCAN)—Specifies the number of features that must be found within a search distance of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search distance is set using the Search Distance parameter. When using time to find clusters, Search Duration is required. When searching for cluster members, Minimum Features per Cluster must be found within Search Distance and Search Duration to form a cluster. Note that this distance and duration are not related to the diameter or time extent of the point clusters discovered.
    • Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

  • This tool produces an output feature class with a new integer field, CLUSTER_ID, that identifies the cluster where each feature is located. Default rendering is based on the COLOR_ID field. Multiple clusters will be assigned each color. Colors will be assigned and repeated so that each cluster is visually distinct from its neighboring clusters.

  • If the Defined distance (DBSCAN) clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:

    • FEAT_TIME —The original instant time of each feature.
    • START_DATETIME—The start time of the time extent of the cluster a feature belongs to.
    • END_DATETIME—The end time of the time extent of the cluster a feature belongs to.

    The result layer's time properties will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring that all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. These fields are used for visualization only. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.

  • If Clustering Method is Self-adjusting (HDBSCAN), the output feature class will also contain the following fields:

    • PROB—The probability that a feature belongs in its assigned cluster.
    • OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
    • EXEMPLAR—The features that are most representative of each cluster. These features are indicated by a value of 1.
    • STABILITY—The persistence of each cluster across a range of scales. A larger value indicates that a cluster persists over a wider range of distance scales.

  • When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless your administrator increases the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Approximately 2 GB of heap space is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB typically required by GeoAnalytics Server. For example, to cluster 9 million features with HDBSCAN, set javaHeapSize to no less than 6144 MB, or 6 GB. In this case, each GeoAnalytics Server machine should have a total of at least 22 GB of RAM available.

  • You can improve the performance of the Find Point Clusters tool by using one or more of the following tips:

    • Set the extent environment so you only analyze data of interest.
    • Be selective in the search distance and duration. A narrower search distance or radius may perform better on the same data.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. It is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

Parameters

LabelExplanationData Type
Input Point Layer

The point feature class containing the point clusters.

Feature Set
Output Name

The name of the output feature service.

String
Minimum Features per Cluster

This parameter is used differently depending on the clustering method chosen as follows:

  • Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search Distance parameter.
  • Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Long
Search Distance

The maximum distance to be considered.

The Minimum Features per Cluster specified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.

Linear Unit
Data Store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is Spatiotemporal big data store. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • Spatiotemporal big data storeOutput will be stored in a spatiotemporal big data store. This is the default.
  • Relational data storeOutput will be stored in a relational data store.
String
Clustering Method
(Optional)

Specifies the method that will be used to define clusters.

  • Defined distance (DBSCAN) Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance that works well to define all clusters that may be present. This results in clusters that have similar densities. This is the default.
  • Self-adjusting (HDBSCAN) Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data driven of the clustering methods and requires the least user input.
String
Use Time to Find Clusters
(Optional)

Specifies whether or not time will be used to discover clusters with DBSCAN.

  • Checked—Spatiotemporal clusters will be found using both a search distance and a search duration.
  • Unchecked—Spatial clusters will be found using a search distance and time will be ignored. This is the default.
Boolean
Search Duration
(Optional)

When searching for cluster members, the specified minimum number of points must be found within this time duration to form a cluster.

Time Unit

Derived Output

LabelExplanationData Type
Output Feature Layer

The output point clusters.

Feature Set

arcpy.geoanalytics.FindPointClusters(input_points, output_name, minimum_points, search_distance, {data_store}, {clustering_method}, {use_time}, {search_duration})
NameExplanationData Type
input_points

The point feature class containing the point clusters.

Feature Set
output_name

The name of the output feature service.

String
minimum_points

This parameter is used differently depending on the clustering method chosen as follows:

  • Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search Distance parameter.
  • Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Long
search_distance

The maximum distance to be considered.

The Minimum Features per Cluster specified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.

Linear Unit
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String
clustering_method
(Optional)

Specifies the method that will be used to define clusters.

  • DBSCAN Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance that works well to define all clusters that may be present. This results in clusters that have similar densities. This is the default.
  • HDBSCAN Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data driven of the clustering methods and requires the least user input.
String
use_time
(Optional)

Specifies whether or not time will be used to discover clusters with DBSCAN.

  • TIMESpatiotemporal clusters will be found using both a search distance and a search duration.
  • NO_TIMESpatial clusters will be found using a search distance and time will be ignored. This is the default.
Boolean
search_duration
(Optional)

When searching for cluster members, the specified minimum number of points must be found within this time duration to form a cluster.

Time Unit

Derived Output

NameExplanationData Type
output

The output point clusters.

Feature Set

Code sample

FindPointClusters example (stand-alone script)

The following Python window script demonstrates how to use the FindPointClusters tool.

#-------------------------------------------------------------------------------
# Name: FindPointClusters.py
# Description: Finds Point Clusters of rodent infestations
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
inputPoints = "https://myGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_countyData/BigDataCatalogServer/rat_sightings"
minimumPoints = 10
outputName = "RodentClusters"
searchDistance = "1 Kilometers"
dataStore = "SPATIOTEMPORAL_DATA_STORE"
clusterMethod = "DBSCAN"

# Execute Find Point Clusters
arcpy.geoanalytics.FindPointClusters(inputPoints, outputName, mimimumPoints, 
                                     searchDistance, dataStore, clusterMethod)

Environments

Special cases

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics