Summary
Refreshes an existing big data connection (BDC) and registers any new datasets that have been added to the source location.
Usage
This tool requires a BDC. To create a BDC, use the Create Big Data Connection tool.
Use this tool to add one or more new datasets to an existing big data connection. Additionally, the tool will reregister any datasets that have been removed using the Remove Dataset From Big Data Connection tool. The following are examples of when to use this tool:
- You copied a folder of data to your existing BDC source folder and want it represented as a dataset in your BDC.
- You used the Remove Dataset From Big Data Connection tool and you want to add the removed datasets back to the BDC.
This tool does not refresh existing dataset properties that have been edited using the Update Big Data Connection Dataset Properties tool. All modified properties will be maintained. The following scenarios include the recommended workflows:
- You modified the schema of an existing source dataset—Use the Update Big Data Connection Dataset Properties tool to modify the fields. You can also use the Remove Dataset From Big Data Connection tool to remove the dataset and the Refresh Big Data Connection tool if there are no existing modifications to the BDC dataset that you want to maintain.
- You added new files to an existing dataset—No additional steps are required. When you run a geoprocessing tool to analyze your BDC data, all files in the BDC dataset will be included for analysis.
- You deleted an existing dataset—Use the Remove Dataset From Big Data Connection tool to remove the dataset from the BDC
The tool messages will include the following information on the datasets discovered and their status:
- Skipped—All existing datasets are skipped during refresh and remain as is.
- Succeeded—New datasets that have been discovered and added to the BDC.
- Failed—Datasets that were not successfully added to the BDC.
You may run into one of two issues when discovering datasets in your BDC:
- Datasets that you expected are missing. In this case, verify that the path you specified as a source folder that contains subfolders is correct and that it's a supported data type.
- One or more datasets fail to register. If datasets fail to register, you may note some of the following:
Issue Solution Example The dataset is not in the expected format.
Open the file to see if it looks as expected. If the data is structured incorrectly, update and try again.
A .csv file has a few lines and a summary of the data, and then only empty lines.
The schemas of datasets in a folder do not match.
All files in a dataset folder must have the same schema. Open the files to compare the schemas. Resolve any mismatched schemas and try to register the dataset again.
You have one .csv file with 10 fields, and another with 8.
The file types of a dataset in a folder do not match.
All files in a dataset folder must have the same extension (file type). Check the file types of the data source location and remove or relocate any misplaced files.
A shapefile dataset is in the same folder as a parquet file.
You have an unrecognized field format.
This is unlikely but may occur if ORC and parquet use an unexpected format. Ensure that you use valid field formats.
You have a parquet file with an unknown field format.
Once you refresh a BDC, use the Describe Dataset tool to verify that the updated dataset looks as expected.
The Refresh Big Data Connection tools identifies new datasets. The following tools can also be used to modify a BDC:
- Copy Dataset From Big Data Connection—Copies a dataset from a BDC to a feature class.
- Duplicate Dataset From Big Data Connection—Creates a view of an existing BDC dataset.
- Remove Dataset From Big Data Connection—Removes a dataset from the BDC.
- Update Big Data Connection Dataset Properties—Modifies the properties of an individual BDC dataset.
- Preview Dataset From Big Data Connection—Previews the first ten features in your dataset to verify they are correctly registered.
- Describe Dataset — To verify that the dataset looks as expected.
This geoprocessing tool is powered by Spark. See Big data connections to learn more about big data connections and how to use them.
Syntax
arcpy.gapro.RefreshBDC(bdc_file, {visible_geometry}, {visible_time})
Parameter | Explanation | Data Type |
bdc_file | The BDC file to refresh. | File |
visible_geometry (Optional) | Specifies whether the fields used to identify the geometry will be included (visible) as fields for analysis when the BDC file is used in other geoprocessing tools. When geometry fields are not visible, geometry is still applied to the dataset. The geometry visibility setting can be modified in the BDC.
| Boolean |
visible_time (Optional) | Specifies whether the fields used to indicate the time will be included (visible) as fields for analysis when the BDC file is used in other geoprocessing tools. When time fields are not visible, time is still applied to the dataset. The time visibility setting can be modified in the BDC.
| Boolean |
Derived Output
Name | Explanation | Data Type |
updated_bdc | The input .bdc file with updated datasets. | File |
Code sample
The following Python script demonstrates how to use the RefreshBDC function.
# Name: RefreshBDC.py
# Description: Refreshes a big data connection to automatically discover datasets that
# have been added.
#
# Requirements: ArcGIS Pro Advanced License
# Import system modules
import arcpy
# Set local variables
bdcFile = r"c:\Projects\MyProjectFolder\my_BigDataConnection.bdc"
# Execute Refresh Big Data Connection
arcpy.gapro.refreshBDC(bdcFile)
Environments
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes