FAQs about using a Parquet file in ArcGIS Pro

The following questions and answers provide detailed information about using an Apache Parquet file from a local folder connection or cloud storage connection in ArcGIS Pro.

Caches

Cloud storage

Mapping

Sharing

Caches

How big are the local caches that are created for a Parquet file I use in ArcGIS Pro?

Because Parquet is a highly compressed storage format, the local cache files that ArcGIS Pro creates are typically much larger than the original file.

For example, a Parquet file containing 1 million point records stored in a 20 MB Parquet file may result in a cache size of 250 MB. The difference in size depends on the data contained in the Parquet file, such as the number of columns and the data and entity types.

The size difference between the file and the cache are not linear.

Can I clear the local caches?

You can delete the files in the ParquetCache directory. The default location of this directory is C:\Users\<userprofile>\Documents\ArcGIS\ParquetCache. After you delete a cache file, ArcGIS Pro will re-create it the next time you access the Parquet file in a way that causes ArcGIS Pro to create a local cache as described in Cached Parquet data.

Alternatively, you can delete the local caches and re-create any that you need using the CreateParquetCache ArcPy function.

Cloud storage

Which cloud provider can I use to host the Parquet files I access individually to add to a map or scene?

You can create a cloud storage connection to an Amazon Simple Storage Service (S3) bucket.

What type of credentials can I use to create a cloud storage connection that accesses a Parquet file in an Amazon S3 bucket?

You can use an Access Key or a session token. If the bucket is configured for anonymous access, no credentials are required to access the file in it. See the Create Cloud Connection File tool documentation for a list of supported credential types.

What resource-based policy permissions must I configure for an IAM role to allow ArcGIS Pro to use a Parquet file in an Amazon S3 bucket?

At a minimum, the IAM role requires the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "<statement-id>",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::<cache-bucket-name>/*",
                "arn:aws:s3:::<cache-bucket-name>"
            ]
        }
    ]
}

Replace the values inside the angle brackets (<>) with values specific to your IAM role and bucket.

The version of the policy document format shown above is 2012-10-17. If you change this version date, the document format may also need to change.

Mapping

Is there a way to display features in a map or scene in ArcGIS Pro based on the information stored in x,y,z fields in a Parquet file?

Run the XY Table To Point geoprocessing tool with the Parquet map layer as the input table to create a feature class in a supported output format. Then add the output feature class to the map or scene.

Can I aggregate features from a Parquet file into bins on the map?

Yes. If the Parquet file contains more than 10,000 rows, the feature layer that is added to the map will draw with geosquare bins. You can set a different scale threshold for the layer or disable binning. However, you cannot change to a different bin type, because only geosquare bins are supported.

Sharing

Can I publish a web layer from the data in a Parquet file that I add to a map or scene from a folder or cloud storage connection in ArcGIS Pro?

No, not at this time.

Can I include cached Parquet file data in packages, such as map packages, or project packages?

No, not at this time.