An overview of the GeoAnalytics Desktop toolbox

GeoAnalytics Desktop tools provide a parallel processing framework for analysis on a desktop machine using Apache Spark. Through aggregation, regression, detection, and clustering, you can visualize, understand, and interact with big data. These tools work with big datasets and allow you to gain insight into your data through patterns, trends, and anomalies. The tools are integrated and run in ArcGIS Pro in the same way as other desktop geoprocessing tools.

GeoAnalytics Server tools are a better choice than GeoAnalytics Desktop tools in the following circumstances:

  • Data is stored in hosted feature layers.
  • Analysis output will be located in ArcGIS Enterprise.
  • More than one machine will be used to distribute analysis.
  • Linux, a web app, or a Server machine will be used to complete analysis.
  • A collection of files (such as delimited files or shapefiles) or a big data file share source (such as cloud stores, HDFS, or Hive) will be used.
Learn more about geoprocessing considerations for co-locating analysis processing and data

GeoAnalytics Desktop tools are designed for large datasets; consequently, other desktop tools may be more appropriate for use with smaller datasets. GeoAnalytics Desktop tools require an initial startup time to implement the distributed processing, so they are optimal for larger datasets (hundreds of thousands or millions of records).

There are a few things to consider when using GeoAnalytics Desktop tools. Spark allocates your machine's memory and CPU cores when you run these tools. By default, this will be 95 percent of your machine's memory and all of the machine's CPU cores. The analysis will only begin if these resources are available and aren't in use by another service. These resources will be held for 30 seconds after tool completion. If you run another GeoAnalytics Desktop tool in the same ArcGIS Pro project within those 30 seconds, it will use those same reserved resources. If you open a new ArcGIS Pro project and try to run another job simultaneously, it will also use those same resources. You can change the amount of resources allocated to the job by modifying the parallel processing factor geoprocessing environment. This can be specified as a number (the number of cores) or a percentage (a percentage of total cores).

ToolsetDescription

Analyze Patterns

The Analyze Patterns toolset contains tools that identify, quantify, and visualize spatial patterns in feature data.

Data Enrichment

The Data Enrichment toolset contains a tool for adding attributes to existing features for visualization, regression, and prediction.

Find Locations

This tools in this toolset are used to identify areas that meet specified criteria. The criteria can be based on attribute queries (for example, parcels that are vacant) and spatial queries (for example, within 1 kilometer of a river). The areas that are found can be selected from existing features (such as existing land parcels), or features can be created when all the requirements are met.

Manage Data

The Manage Data toolset contains tools used for the day-to-day management of geographic data.

Summarize Data

The Summarize Data toolset contains tools that calculate total counts, lengths, areas, and basic descriptive statistics of features and their attributes within areas or near other features.

Use Proximity

The Use Proximity toolset contains tools for answering the spatial analysis question, What is near what?

Utilities

The Utilities toolset contains tools that support the creation and modification of multifile feature connections.