The following questions and answers provide detailed information about using an Apache Parquet file from a local folder connection or cloud storage connection in ArcGIS Pro.
Caches
Cloud storage
- Which cloud provider can I use to host the Parquet files I access individually to add to a map or scene?
- What type of credentials can I use to create a cloud storage connection that accesses a Parquet file in an Amazon S3 bucket?
- What resource-based policy permissions must I configure for an IAM role to allow ArcGIS Pro to use a Parquet file in an Amazon S3 bucket?
Mapping
Sharing
Caches
Because Parquet is a highly compressed storage format, the local cache files that ArcGIS Pro creates are typically much larger than the original file.
For example, a Parquet file containing 1 million point records stored in a 20 MB Parquet file may result in a cache size of 250 MB. The difference in size depends on the data contained in the Parquet file, such as the number of columns and the data and entity types.
The size difference between the file and the cache are not linear.
You can delete the files in the ParquetCache directory. The default location of this directory is C:\Users\<userprofile>\Documents\ArcGIS\ParquetCache. After you delete a cache file, ArcGIS Pro will re-create it the next time you access the Parquet file in a way that causes ArcGIS Pro to create a local cache as described in Cached Parquet data.
Alternatively, you can delete the local caches and re-create any that you need using the CreateParquetCache ArcPy function.
Cloud storage
Which cloud provider can I use to host the Parquet files I access individually to add to a map or scene?
You can create a cloud storage connection to an Amazon Simple Storage Service (S3) bucket.
What type of credentials can I use to create a cloud storage connection that accesses a Parquet file in an Amazon S3 bucket?
You can use an Access Key or a session token. If the bucket is configured for anonymous access, no credentials are required to access the file in it. See the Create Cloud Connection File tool documentation for a list of supported credential types.
What resource-based policy permissions must I configure for an IAM role to allow ArcGIS Pro to use a Parquet file in an Amazon S3 bucket?
At a minimum, the IAM role requires the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "<statement-id>",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::<cache-bucket-name>/*",
"arn:aws:s3:::<cache-bucket-name>"
]
}
]
}Replace the values inside the angle brackets (<>) with values specific to your IAM role and bucket.
The version of the policy document format shown above is 2012-10-17. If you change this version date, the document format may also need to change.
Mapping
Is there a way to display features in a map or scene in ArcGIS Pro based on the information stored in x,y,z fields in a Parquet file?
Run the XY Table To Point geoprocessing tool with the Parquet map layer as the input table to create a feature class in a supported output format. Then add the output feature class to the map or scene.
Yes. If the Parquet file contains more than 10,000 rows, the feature layer that is added to the map will draw with geosquare bins. You can set a different scale threshold for the layer or disable binning. However, you cannot change to a different bin type, because only geosquare bins are supported.