Find Point Clusters (GeoAnalytics Desktop)—ArcGIS Pro

Summary

Finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.

Learn more about how Density-based Clustering works

Illustration

Usage

The input for Find Point Clusters is a point layer. This tool extracts clusters from the Input Point Layer and identifies any surrounding noise.
Find Point Clusters requires that Input Point Layer is projected or the output coordinate system is set to a projected coordinate system.
There are two Clustering Method parameter options. Defined distance (DBSCAN) uses the DBSCAN algorithm and finds clusters of points that are in close proximity based on a specified search distance. Self-adjusting (HDBSCAN) uses the HDBSCAN algorithm and finds clusters of points, similar to DBSCAN, using varying distances, allowing for clusters with varying densities based on cluster probability (or stability). If DBSCAN is chosen, clusters can be found in either two-dimensional space only or both in space and time. If you select Use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points that are in close proximity based on a specified search distance and search duration.
The Minimum Features per Cluster parameter is used differently, depending on the clustering method:
- Defined distance (DBSCAN)—Specifies the number of features that must be found within a search distance of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search distance is set using the Search Distance parameter. When using time to find clusters, Search Duration is required. When searching for cluster members, Minimum Features per Cluster must be found within Search Distance and Search Duration to form a cluster. Note that this distance and duration are not related to the diameter or time extent of the point clusters discovered.
- Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.
This tool produces an output feature class with a new integer field, CLUSTER_ID, that identifies the cluster where each feature is located. Default rendering is based on the COLOR_ID field. Multiple clusters will be assigned each color. Colors will be assigned and repeated so that each cluster is visually distinct from its neighboring clusters.
If the Defined distance (DBSCAN) clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:
- FEAT_TIME —The original instant time of each feature.
- START_DATETIME—The start time of the time extent of the cluster a feature belongs to.
- END_DATETIME—The end time of the time extent of the cluster a feature belongs to.
The result layer's time properties will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring that all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. These fields are used for visualization only. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.
If Clustering Method is Self-adjusting (HDBSCAN), the output feature class will also contain the following fields:
- PROB—The probability that a feature belongs in its assigned cluster.
- OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
- EXEMPLAR—The features that are most representative of each cluster. These features are indicated by a value of 1.
- STABILITY—The persistence of each cluster across a range of scales. A larger value indicates that a cluster persists over a wider range of distance scales.
You can improve the performance of the Find Point Clusters tool by using one or more of the following tips:
- Set the extent environment so you only analyze data of interest.
- Be selective in the search distance and duration. A narrower search distance or radius may perform better on the same data.
- Use data that is local to where the analysis is being run.
This geoprocessing tool is powered by Spark. Analysis is completed on your desktop machine using multiple cores in parallel. See Considerations for GeoAnalytics Desktop tools to learn more about running analysis.
When running GeoAnalytics Desktop tools, the analysis is completed on your desktop machine. For optimal performance, data should be available on your desktop. If you are using a hosted feature layer, it is recommended that you use ArcGIS GeoAnalytics Server. If your data isn't local, it will take longer to run a tool. To use your ArcGIS GeoAnalytics Server to perform analysis, see GeoAnalytics Tools.

Parameters

Label	Explanation	Data Type
Input Point Layer	The point feature class containing the point clusters.	Feature Layer
Output Feature Class	A new feature class with the resulting point clusters.	Feature Class
Clustering Method	Specifies the method that will be used to define clusters. Defined distance (DBSCAN) — Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance to use that works well to define all clusters that may be present. This results in clusters that have similar densities. This is the default. Self-adjusting (HDBSCAN) — Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data-driven of the clustering methods and requires the least user input.	String
Minimum Features per Cluster	This parameter is used differently depending on the clustering method chosen as follows: Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search Distance parameter. Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.	Long
Search Distance	The maximum distance to be considered. The Minimum Features per Cluster specified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.	Linear Unit
Use Time to Find Clusters (Optional)	Specifies whether or not time will be used to discover clusters with DBSCAN. Checked—Spatiotemporal clusters will be found using both a search distance and a search duration. Unchecked—Spatial clusters will be found using a search distance and time will be ignored. This is the default.	Boolean
Search Duration (Optional)	When searching for cluster members, the specified minimum number of points must be found within this time duration to form a cluster.	Time Unit

arcpy.gapro.FindPointClusters(input_points, out_feature_class, clustering_method, minimum_points, search_distance, {use_time}, {search_duration})

Name	Explanation	Data Type
input_points	The point feature class containing the point clusters.	Feature Layer
out_feature_class	A new feature class with the resulting point clusters.	Feature Class
clustering_method	Specifies the method that will be used to define clusters. DBSCAN — Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance to use that works well to define all clusters that may be present. This results in clusters that have similar densities. This is the default. HDBSCAN — Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data-driven of the clustering methods and requires the least user input.	String
minimum_points	This parameter is used differently depending on the clustering method chosen as follows: Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search Distance parameter. Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.	Long
search_distance	The maximum distance to be considered. The Minimum Features per Cluster specified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.	Linear Unit
use_time (Optional)	Specifies whether or not time will be used to discover clusters with DBSCAN. TIME —Spatiotemporal clusters will be found using both a search distance and a search duration. NO_TIME —Spatial clusters will be found using a search distance and time will be ignored. This is the default.	Boolean
search_duration (Optional)	When searching for cluster members, the specified minimum number of points must be found within this time duration to form a cluster.	Time Unit

Code sample

FindPointClusters example (stand-alone script)

The following Python window script demonstrates how to use the FindPointClusters tool.

#-------------------------------------------------------------------------------
# Name: FindPointClusters.py
# Description: Finds Point Clusters of rodent infestations

# Import system modules
import arcpy

arcpy.env.workspace = "C:/data/CountyData.gdb"

# Set local variables
inputPoints = "rat_sightings"
minimumPoints = 10
outputName = "RodentClusters"
searchDistance = "1 Kilometers"
clusterMethod = "DBSCAN"

# Execute Find Point Clusters
arcpy.gapro.FindPointClusters(inputPoints, outputName, clusterMethod, 
                              minimumPoints, searchDistance)

Environments

Output Coordinate System, Extent, Current Workspace, Parallel Processing Factor

Licensing information

Basic: No
Standard: No
Advanced: Yes