Apache Arrow in ArcGIS

Apache Arrow is an in-memory, columnar, cross-platform, cross-language, and open-source data representation that allows you to efficiently transfer data between resources. Many big data projects interface with Arrow, making it a convenient option to read and write columnar file formats across languages and platforms. For more information, see Apache Arrow documentation for use cases and projects and products using Apache Arrow.

The primary tabular data representation in Arrow is the Arrow table. The Arrow table is a two-dimensional tabular representation in which columns are Arrow chunked arrays. The interface for Arrow in Python is PyArrow. For more information, see the Apache Arrow and PyArrow library documentation External link.

Tables and feature data

You can convert tables and feature classes to an Arrow table using the TableToArrowTable function in the data access (arcpy.da) module.

Create an Arrow table.

import arcpy

infc = r'C:\data\usa.gdb\USA\counties'
arrow_table = arcpy.da.TableToArrowTable(infc)

To convert an Arrow table to a table or feature class, use the Copy Rows or Copy Features tool.

Create an Arrow table from scratch and convert it to a geodatabase table.

import arcpy
import pyarrow

# Create fields for schema so that data type can be specified
fields = [
    pyarrow.field('name', pyarrow.string()),
    pyarrow.field('state', pyarrow.string()),
    pyarrow.field('area_sqmi', pyarrow.float32())

]
# Create data (smallest and largest US county)
arrays = [
    pyarrow.array(['San Bernardino', 'Arlington']),
    pyarrow.array(['California', 'Virginia']),
    pyarrow.array([20105.32,  25.99])
]

# Create Arrow table from data and schema
pyarrow_table = pyarrow.Table.from_arrays(
    arrays=arrays,
    schema=pyarrow.schema(fields)
)

# Convert Arrow table to geodatabase table
counties = arcpy.management.CopyRows(
    pyarrow_table, r'C:\data\usa.gdb\USA\smallest_largest_county')

Arrow tables can be used as input to any geoprocessing tool that accepts a table or feature class, with the exception of tools that modify the input, such as the Calculate Field tool. While a geoprocessing tool can accept an Arrow table as input, the output will not be an Arrow table and will instead be a table or feature class.

Type conversions

When converting a table or feature class to an Arrow table using the TableToArrowTable function, the data types of the created Arrow table's columns (pyarrow.ChunkedArray objects) are determined from the field types of the input table or feature class.

Field typePyArrow data type

Short

int16

Long

int32

Float

float

Double

double

Text

string

Date

date64

Object ID

esri.oid (int64)

Geometry

esri.geometry (binary)

Other field types not listed above, including raster and BLOB fields, are not converted and will be dropped.

注意:

Text fields are trimmed at 5,000 characters when converted to an Arrow table.

When converting an Arrow table to a table or feature class using a geoprocessing tool, the field types of the output table or feature class are determined by the data types of the input Arrow table's columns. An Object ID field will automatically be added to the output table or feature class.

PyArrow data typeField type

bool

Short

int8

Short

int16

Short

int32

Long

int64

Double

uint8

Short

uint16

Long

uint32

Double

uint64

Double

float32

Float

float64

Double

string

Text

utf8

Text

date32

Date

date64

Date

esri.oid (int64)

Object ID

esri.geometry (binary)

Geometry

Any Arrow data types not listed above will not be converted and will be dropped.

注意:

To use an Arrow table as input to a geoprocessing tool that requires a feature class or feature layer, the Arrow table must include geometry. Use the schema property of an Arrow table to determine whether an esri.geometry field is present.


このトピックの内容
  1. Tables and feature data
  2. Type conversions