Find Similar Locations (GeoAnalytics)

Summary

Identifies the candidate features that are most similar or dissimilar to one or more input features based on feature attributes.

Illustration

Find Similar Locations

Usage

  • Tabular, point, line, or area features can be used.

  • An input search (candidate) layer is required. The features in the search layer will be ranked by similarity to the input (reference) locations.

  • If there is more than one feature in the Input Layer, matching is based on averaged Input Layer values. For example, if there are two Input Layer features and one of the Analysis Fields attributes is a population variable, the tool will search for Search Layers with populations that are similar to the average population values. If the population values are 100 and 102, for example, the tool will search for candidates with populations near 101.

    Note:

    If there is more than one Input Layer, choose Analysis Fields attributes with similar values. If, for example, the population value for one of the inputs is 100 and the other input is 100,000, the tool will search for matches with populations near the average of those two values: 50,050. Notice that this averaged value is far from the population value of either Input Layer.

  • Use the Most Or Least Similar parameter to search for features that are either most similar or least similar to the Input Layer features using the Most similar or Least similar option, respectively. In some cases, you may want to see both. If the Number of Results parameter value is 3 and the Most Or Least Similar parameter value is Both, for example, the tool will find the three most similar and the three least similar candidate features.

  • Any given solution match in Output Features will be either a solution that is most similar or a solution that is least similar to the target Input Layer; a single solution cannot be both (and solution matches won't be duplicated in Output Features). Consequently, when the Most Or Least Similar parameter value is Both, the maximum number of resulting matches possible (Number of Results) will be half the number of the Search Layer.

  • A maximum of 10,000 search layer features will be returned.
  • The Match Method parameter has the following value options:

    • Attribute values—The most similar candidates will have the smallest sum of squared differences for all Analysis Fields attributes. All values are standardized before differences are calculated.
    • Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, suppose there are three Analysis Fields called A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match Method parameter value is Attribute profiles, the tool will search for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two Analysis Fields attributes. You could use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles, but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 year old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the Output Features simindex (Cosine similarity) field.

  • The Analysis Fields parameter should be numeric and present, with the same field name and field type in both the Input Layer and Search Layer datasets. If the tool doesn't find corresponding fields for the Search Layer, a warning appears indicating that the missing attributes were dropped from the analysis.

  • All of the attributes used for matching are written to the output. The Append Fields parameter allows you to specify fields to add to the output table. By default, all fields are added. Use the Append Fields parameter to select specific fields from the Search Layer that you want to add.

  • All of the Input Layer and solution matches are written to the output features along with the Analysis Fields and Append Fields parameters. In addition, the following fields are included in the output features:

    Field nameDescriptionNotes

    location_type

    A string indicating whether features are a reference layer (input) or candidate layer (search).

    simrank

    When you select Most Similar or Both as the Most Or Least Similar parameter value, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1.

    This field is only included in the Output Features when you select Most Similar or Both as the Most Or Least Similar parameter value.

    dissimrank

    When you select Least similar or Both for the Most Or Least Similar parameter value, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1.

    This field is only included in the Output Features when you select Least similar or Both as the Most Or Least Similar parameter value.

    simindex

    This field quantifies how similar each solution match is to the target feature. When you specify Attribute values as the Match Method parameter value, this value represents the sum of squared value differences.

    For more information about how this index is computed, see How Similarity Search Works.

    This field is only included in the Output Features when you select Attribute values as the Match Method parameter value.

    cosimindex

    This field quantifies how similar each solution match is to the target feature. When you specify Attribute profiles as the Match Method parameter value, this value represents the cosine similarity.

    For more information about how this index is computed, see How Similarity Search Works.

    This field is only included in the Output Features when you select Attribute profiles as the Match Method parameter value.

    labelrank

    This field is for display purposes only. The tool uses this field to provide default rendering of the analysis results.

    reference_id

    A unique ID value for reference features. Search features are given a null value.

    This field is available at ArcGIS Enterprise 10.6.1 or later.

    search_id

    A unique ID value for search features. Reference features are given a null value.

    This field is available at ArcGIS Enterprise 10.6.1 or later.

  • The output is automatically added to the table of contents with default rendering applied to the labelrank field.

  • You can improve performance of the Find Similar Locations tool by doing one or more of the following tips:

    • Set the extent environment so that you only analyze data of interest.
    • Select only a few features for the reference layer.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server Tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. Therefore, it is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

  • Similar analysis can also be completed using the Similarity Search tool in the Spatial Statistics toolbox in ArcGIS Pro.

Syntax

FindSimilarLocations(input_layer, search_layer, output_name, analysis_fields, most_or_least_similar, match_method, number_of_results, {append_fields}, {data_store})
ParameterExplanationData Type
input_layer

The reference layer (or a selection on a layer) containing the features to be matched. The tool searches for other features similar to these features. When more than one feature is provided, matching is based on attribute averages.

Record Set
search_layer

The candidate layer (or a selection on a layer) containing candidate-matching features. The tool searches for features most similar (or dissimilar) to the input_layer parameter among these candidates.

Record Set
output_name

The name of the output feature service. The output feature service contains a record for each of the input_layer parameters and for all the solution-matching features found.

String
analysis_fields
[analysis_fields,...]

A list of numeric attributes representing the matching criteria.

String
most_or_least_similar

Specifies whether the features to be found are most similar or least similar to the input_layer parameter.

  • MOST_SIMILARFinds the features that are most similar.
  • LEAST_SIMILARFinds the features that are least similar.
  • BOTHFinds the features that are most similar and the features that are least similar.
String
match_method

Specifies whether matches will be based on values or cosine relationships.

  • ATTRIBUTE_VALUESSimilarity or dissimilarity will be based on the sum of squared standardized attribute value differences for all the analysis_fields attributes.
  • ATTRIBUTE_PROFILESSimilarity or dissimilarity will be computed as a function of cosine similarity for all the analysis_fields attributes.
String
number_of_results

The number of solution matches to be found. Entering zero or a number larger than the total number of search_layer features will return rankings for all the candidate features, with a maximum of 10,000.

Long
append_fields
[append_fields,...]
(Optional)

An optional list of attributes to include with the output. You can include a name identifier, categorical field, or date field for example. These fields are not used to determine similarity; they are only included in the output parameter attributes for your reference. By default, all fields are added.

Field
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in the SPATIOTEMPORAL_DATA_STORE will be stored in WGS84. Results stored in a RELATIONAL_DATA_STORE will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String

Derived Output

NameExplanationData Type
output

Features from the input and all the solution-matching features found.

Record Set

Code sample

FindSimilarLocations (Python window)

The following Python window script demonstrates how to use the FindSimilarLocations tool.

#-------------------------------------------------------------------------------
# Name: FindSimilarLocations.py
# Description: Find Similar stores to a top performing store
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
referenceStore = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Stores/BigDataCatalogServer/TopPerformer"
candidateStores = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Stores/BigDataCatalogServer/AllStores"
analysisFields = [ "SickDays", "TotalCustomers", "AvgPurchaseAmount"]
outputName = "BestStores_10"
dataStore = "SPATIOTEMPORAL_DATA_STORE"

# Execute Find Similar Locations
arcpy.geoanalytics.FindSimilarLocations(referenceStore, candidateStores, 
                                        outputName, analysisFields, 
                                        "MOST_SIMILAR", "ATTRIBUTE_VALUES", 10, 
                                        None, dataStore)

Environments

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics