Cached Parquet data

The data from an Apache Parquet file that you access from a local folder or cloud storage connection is cached locally when you do any of the following:

  • Add the data to a map or scene from a Parquet file.
  • Open the Fields view from the Parquet file in the Catalog pane.
  • Open the Properties dialog box from the Parquet file in the Catalog pane.
  • Add the Parquet file to a geoprocessing tool, or access it from an ArcPy script.

These local caches are created per user per machine. The caches improve performance when you query the data or pan or zoom around the map or scene when the data is present. It also provides for the unique identifier field that ArcGIS requires, and it allows ArcGIS Pro to aggregate features into bins for improved display of numerous features.

Tip:

More information about caches is available in FAQs about using a Parquet file in ArcGIS Pro.

Cache types

The type of cache that ArcGIS Pro creates depends on the number of records in the Parquet file, as described in the sections below.

In-memory cache

An in-memory cache is created on the client machine if the Parquet file contains fewer than 500,000 records. It takes less time to create an in-memory cache than a persistent cache.

While ArcGIS Pro is open, it references the data in the in-memory cache. When you close ArcGIS Pro, the cache is deleted.

Persistent cache

A persistent cache file is created on the client machine if the Parquet file contains 500,000 or more records.

The greater the amount of data that exists in the Parquet file, the longer it takes to generate a persistent cache. To avoid having to wait for ArcGIS Pro to generate the cache when you perform one of the tasks listed above, you can create the cache first using the Create Parquet Cache geoprocessing tool or running the CreateParquetCache ArcPy function in a Python window.

When the last modified date of the source Parquet file changes, ArcGIS Pro re-creates the local cache.

ArcGIS Pro deletes smaller persistent caches (1 GB or smaller) automatically if they have not been accessed in the last 30 days. In this case, access is recorded for the actions listed above, as well as the following:

  • Open a map or scene in which the data is saved.
  • Open the Fields view of the map layer by clicking Data Design > Fields on the layer’s context menu in the Contents pane.
  • Open the Properties dialog box for the map layer by clicking Properties on the layer’s context menu in the Contents pane.
  • Add the Parquet file to a geoprocessing tool. or access it from an ArcPy script.

Caches that are larger than 1 GB are retained regardless of the last modified date due to the time it takes to build large persistent caches.

Related topics


In this topic
  1. Cache types