Use big data connections

You can configure, visualize, and use big data connections (BDC) in analysis.

Use a BDC

Once you have structured your data, you can do the following:

  1. Configure a BDC
  2. Visualize a BDC dataset
  3. Use BDC datasets in analysis

Configure a BDC

To get started, you need to create a BDC. There are two ways to create a BDC:

  • Using the New Big Data Connection dialog box. To access the dialog box, on the Insert ribbon, click Connections, and select New Big Data Connection. The dialog box provides an interactive experience to create a BDC and configure properties on each dataset.
  • Use the Create Big Data Connection geoprocessing tool.

You may run into one of two issues when discovering datasets in your BDC:

  • Datasets that you expected are missing. In this case, verify that the path you specified as a source folder that contains subfolders is correct and that it's a supported data type.
  • One or more datasets fail to register. If datasets fail to register, you may note some of the following:

    IssueSolutionExample

    The dataset is not in the expected format.

    Open the file to see if it looks as expected. If the data is structured incorrectly, update and try again.

    A .csv file has a few lines and a summary of the data and then only empty lines.

    The schemas of datasets in a folder do not match.

    All files in a dataset folder must have the same schema. Open the files to compare the schemas. Resolve any mismatched schemas and try to register the dataset again.

    You have one .csv file with 10 fields and another with 8.

    The file types of a dataset in a folder do not match.

    All files in a dataset folder must have the same extension (file type). Check the file types of the data source location and remove or relocate any misplaced files.

    A shapefile dataset is in the same folder as a parquet file.

    You have an unrecognized field format.

    This is unlikely but may occur if ORC and parquet use an unexpected format. Ensure that you use valid field formats.

    You have a parquet file with an unknown field format.

If you create a BDC using a delimited file and don't see header rows, you may have an invalid header row. Ensure that all fields have a header and that none are empty. If you're using the dialog box to create the big data file share, you can update the field headers on the Fields pane. You can also update field names using the Update Big Data Connection Dataset Properties tool.

When you create a BDC, the schema, geometry, and time are discovered for each of your datasets. Often, there are changes you can make as to how the datasets represent those values. To verify that each dataset correctly represents the geometry, time, and fields, use the Describe Dataset geoprocessing tool. For example, when reviewing your datasets, you may want to make one or more of the following changes to one or more datasets in your BDC:

  • Change the field names of delimited datasets.
  • Modify which fields are visible for analysis.
  • Change the fields used to represent geometry or time.
  • Add a filter to a dataset.
  • Add an alias to a dataset.
  • Remove datasets from the BDC that you aren't interested in analyzing.
  • Refresh the BDC to include a newly added dataset (a new subfolder under the source folder).

To make these optional changes, you can use the New Big Data Connection dialog box or any combination of the following tools:

Visualize a BDC dataset

You can visualize delimited- and shapefile-based BDC datasets on a map.

Note:
You cannot visualize BDC datasets that use parquet and ORC source files.

To add your dataset to the map, locate the BDC item in the Catalog pane, click to expand the datasets, and add the dataset to the map.

BDC datasets have a simplified experience in your map and have the following limitations:

  • When visualizing BDC datasets, the time properties in the BDC dataset properties are not automatically set in the new layer. To visualize the dataset with time, set the layer's time properties after adding the dataset to the map.
  • Drawing delimited files will zoom to the full extent of the BDC dataset's spatial reference.
  • If you add new records to an existing BDC dataset—for example, adding new rows to a CSV file in an existing BDC—the new records will not draw until you restart ArcGIS Pro.
  • If you add new files to an existing BDC dataset—for example, adding a new CSV file to an existing BDC dataset—the new records will not draw until you restart ArcGIS Pro.

Use BDC datasets in analysis

When BDC datasets are used as input to GeoAnalytics Desktop tools, analysis is optimized to read the data and run in parallel across the cores of your machine. For all other geoprocessing tools, BDC dataset reading and processing is not optimized to run in parallel, rather it is sequential and single-threaded.

You can use BDC datasets based on delimited files or shapefiles in most geoprocessing tools.

Note:
BDC datasets using parquet and ORC source files can only be used in GeoAnalytics Desktop tools.

You cannot apply a selection to a BDC dataset when it's used as input to a GeoAnalytics Desktop tool.

To use a BDC dataset in a geoprocessing tool, add a BDC dataset to a map and select the layer name from the parameter choice list or use the browse button to browse to a BDC workspace and select the input dataset. The following tools do not support input BDC files:

  • Service-based tools, including GeoAnalytics Server, standard feature analysis, and ArcGIS Online analysis tools
  • Tools that modify the input dataset, such as Calculate Field and Near


In this topic
  1. Use a BDC