Find Similar Locations (GeoAnalytics Desktop)—ArcGIS Pro

Summary

Identifies the candidate features that are most similar or dissimilar to one or more input features based on feature attributes.

Illustration

Usage

Tabular, point, line, or area features can be used.
An input search (candidate) layer is required. The features in the search layer will be ranked by similarity to the input (reference) locations.
If there is more than one feature in the Input Layer, matching is based on averaged Input Layer values. For example, if there are two Input Layer features and one of the Analysis Fields attributes is a population variable, the tool will search for Search Layers with populations that are similar to the average population values. If the population values are 100 and 102, for example, the tool will search for candidates with populations near 101.
Note:
If there is more than one Input Layer, choose Analysis Fields attributes with similar values. If, for example, the population value for one of the inputs is 100 and the other input is 100,000, the tool will search for matches with populations near the average of those two values: 50,050. Notice that this averaged value is far from the population value of either Input Layer.
Use the Most Or Least Similar parameter to search for features that are either most similar or least similar to the Input Layer features using the Most similar or Least similar option, respectively. In some cases, you may want to see both. If the Number of Results parameter value is 3 and the Most Or Least Similar parameter value is Both, for example, the tool will find the three most similar and the three least similar candidate features.
Any given solution match in Output Features will be either a solution that is most similar or a solution that is least similar to the target Input Layer; a single solution cannot be both (and solution matches won't be duplicated in Output Features). Consequently, when the Most Or Least Similar parameter value is Both, the maximum number of resulting matches possible (Number of Results) will be half the number of the Search Layer.
A maximum of 10,000 search layer features will be returned.
The Match Method parameter has the following value options:
- Attribute values—The most similar candidates will have the smallest sum of squared differences for all Analysis Fields attributes. All values are standardized before differences are calculated.
- Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, suppose there are three Analysis Fields called A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match Method parameter value is Attribute profiles, the tool will search for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two Analysis Fields attributes. You could use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles, but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 year old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the Output Features simindex (Cosine similarity) field.
The Analysis Fields parameter should be numeric and present, with the same field name and field type in both the Input Layer and Search Layer datasets. If the tool doesn't find corresponding fields for the Search Layer, a warning appears indicating that the missing attributes were dropped from the analysis.
All of the attributes used for matching are written to the output. The Append Fields parameter allows you to specify fields to add to the output table. By default, all fields are added. Use the Append Fields parameter to select specific fields from the Search Layer that you want to add.

All of the Input Layer and solution matches are written to the output features along with the Analysis Fields and Append Fields parameters. In addition, the following fields are included in the output features:


Field name	Description	Notes
location_type	A string indicating whether features are a reference layer (input) or candidate layer (search).
simrank	When you select Most Similar or Both as the Most Or Least Similar parameter value, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1.	This field is only included in the Output Features when you select Most Similar or Both as the Most Or Least Similar parameter value.
dissimrank	When you select Least similar or Both for the Most Or Least Similar parameter value, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1.	This field is only included in the Output Features when you select Least similar or Both as the Most Or Least Similar parameter value.
simindex	This field quantifies how similar each solution match is to the target feature. When you specify Attribute values as the Match Method parameter value, this value represents the sum of squared value differences. For more information about how this index is computed, see How Similarity Search Works.	This field is only included in the Output Features when you select Attribute values as the Match Method parameter value.
cosimindex	This field quantifies how similar each solution match is to the target feature. When you specify Attribute profiles as the Match Method parameter value, this value represents the cosine similarity. For more information about how this index is computed, see How Similarity Search Works.	This field is only included in the Output Features when you select Attribute profiles as the Match Method parameter value.
labelrank	This field is for display purposes only. The tool uses this field to provide default rendering of the analysis results.
reference_id	A unique ID value for reference features. Search features are given a null value.
search_id	A unique ID value for search features. Reference features are given a null value.

The output is automatically added to the table of contents with default rendering applied to the labelrank field.
You can improve performance of the Find Similar Locations tool by doing one or more of the following tips:
- Set the extent environment so that you only analyze data of interest.
- Select only a few features for the reference layer.
- Use data that is local to where the analysis is being run.
This geoprocessing tool is powered by Spark. Analysis is completed on your desktop machine using multiple cores in parallel. See Considerations for GeoAnalytics Desktop tools to learn more about running analysis.
When running GeoAnalytics Desktop tools, the analysis is completed on your desktop machine. For optimal performance, data should be available on your desktop. If you are using a hosted feature layer, it is recommended that you use ArcGIS GeoAnalytics Server. If your data isn't local, it will take longer to run a tool. To use your ArcGIS GeoAnalytics Server to perform analysis, see GeoAnalytics Tools.
Similar analysis can also be completed using the Similarity Search tool in the Spatial Statistics toolbox in ArcGIS Pro.

Syntax

arcpy.gapro.FindSimilarLocations(input_layer, search_layer, output, analysis_fields, most_or_least_similar, match_method, number_of_results, {append_fields})

Parameter	Explanation	Data Type
input_layer	The reference layer (or a selection on a layer) containing the features to be matched. The tool searches for other features similar to these features. When more than one feature is provided, matching is based on attribute averages.	Table View
search_layer	The candidate layer (or a selection on a layer) containing candidate-matching features. The tool searches for features most similar (or dissimilar) to the input_layer parameter among these candidates.	Table View
output	The output dataset contains a record for each of the input_layer parameters and for all the solution-matching features found.	Feature Class; Table
analysis_fields [analysis_fields,...]	A list of numeric attributes representing the matching criteria.	String
most_or_least_similar	Specifies whether the features to be found are most similar or least similar to the input_layer parameter. MOST_SIMILAR —Finds the features that are most similar. LEAST_SIMILAR —Finds the features that are least similar. BOTH —Finds the features that are most similar and the features that are least similar.	String
match_method	Specifies whether matches will be based on values or cosine relationships. ATTRIBUTE_VALUES —Similarity or dissimilarity will be based on the sum of squared standardized attribute value differences for all the analysis_fields attributes. ATTRIBUTE_PROFILES —Similarity or dissimilarity will be computed as a function of cosine similarity for all the analysis_fields attributes.	String
number_of_results	The number of solution matches to be found. Entering zero or a number larger than the total number of search_layer features will return rankings for all the candidate features, with a maximum of 10,000.	Long
append_fields [append_fields,...] (Optional)	An optional list of attributes to include with the output. You can include a name identifier, categorical field, or date field for example. These fields are not used to determine similarity; they are only included in the output parameter attributes for your reference. By default, all fields are added.	Field

Code sample

FindSimilarLocations example (Python window)

The following Python window script demonstrates how to use the FindSimilarLocations tool.

#-------------------------------------------------------------------------------
# Name: FindSimilarLocations.py
# Description: Find Similar stores to a top performing store

# Import system modules
import arcpy

arcpy.env.workspace = "C:/data/SalesData.gdb"

# Set local variables
referenceStore = "TopPerformer"
candidateStores = "AllStores"
analysisFields = [ "SickDays", "TotalCustomers", "AvgPurchaseAmount"]
outputName = "BestStores_10"

# Execute Find Similar Locations
arcpy.gapro.FindSimilarLocations(referenceStore, candidateStores, 
                                 outputName, analysisFields, 
                                 "MOST_SIMILAR", "ATTRIBUTE_VALUES", 10)

Environments

Output Coordinate System, Extent, Current Workspace, Parallel Processing Factor

Licensing information

Basic: No
Standard: No
Advanced: Yes