Update Big Data Connection Dataset Properties (GeoAnalytics Desktop)—ArcGIS Pro

Summary

Updates the properties of a big data connection (BDC) dataset. This tool modifies field, geometry, time, and file settings for a specified BDC dataset.

Usage

This tool requires a BDC. To create a BDC, use the Create Big Data Connection tool.
Use this tool to modify BDC dataset schema, geometry, or time for use in analysis or visualization in scenarios such as the following:
- Your CSV dataset was registered with all string type fields and you want to set the fields as numeric for use in analysis.
- Your BDC dataset has attribute values for two separate locations, such as taxi pick-up and taxi drop-off spots, and you want to change the geometry you use for analysis.
- Your workflow requires that time is set on the input layer.
- You want to share a BDC dataset with a colleague who is only interested in a subset of features, so you add a definition query expression and hide some unused fields.
You can modify the following properties:
- Definition query—An expression used to limit the features used in analysis.
- Fields—The field name, field type, and visibility.
- Geometry—How geometry is represented. These are not editable for shapefiles.
- Time—How time is represented.
- File—The file properties used to read the dataset.
Specify the BDC dataset with the properties you want to modify using the Big Data Connection Dataset parameter. You can browse to the dataset or specify it using a pathway such as c:\<path>\MyBDC.bdc\<dataset_name>, for example, c:\MyBDCFolder\MyBDC.bdc\earthquakes_dataset.
Define an expression to limit the features used in analysis using the Expression parameter. Adding a filter to a BDC dataset is similar to applying a definition query to a dataset in your map: specify a SQL expression to filter features of interest.
You can update the field type for delimited files. You cannot update the field type for other data sources (such as shapefile, ORC or parquet files).
You can modify the geometry for delimited files, ORC, and parquet files. You cannot modify the geometry for a shapefile-sourced dataset.

The following table outlines how to specify time formats for the Start Time and End Time parameters when you edit a BDC dataset. The examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.

Time formats in big data connections


Symbol	Meaning	Example
yy	The year, represented by two digits.	16
yyyy	The year, represented by four digits.	2016
MM	The month, represented numerically.	01 or 1
MMM	The month, represented using three letters.	Jan
MMMM	The month, represented using the complete spelling.	January
dd	The day.	02 or 2
HH	The hour when using a 24-hour day; values range from 0-23.	21
hh	The hour when using a 12-hour day; values range from 1-12.	9
mm	The minute; values range from 0-59.	45
ss	The second; values range from 0-59.	02
SSS	The millisecond; values range from 0-999.	50
a	The AM/PM marker.	PM
epoch_millis	The time in milliseconds from epoch.	1509581781000
epoch_seconds	The time in seconds from epoch.	1509747601
Z	The time zone offset expressed in hours.	-0100 or -01:00
ZZZ	The time zone offset expressed using IDs.	America/Los_Angeles
''	Use single quotes to add text that doesn't represent a value outlined in this table.	'T'

The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:

Time format examples


Input date	Date format
01/02/2016 9:45:02PM	MM/dd/yyyy hh:mm:ssa
Jan02-16 21:45:02	MMMdd-yy HH:mm:ss
January 02 2016 9:45:02.050PM	MMMM dd yyyy hh:mm:ss.SSSa
01/02/2017T9:45:14:05-0000	MM/dd/yyyy'T'HH:mm:ssZ

You can specify the time zone using one of the following:

The full name of the time zone: Pacific Standard Time
The time zone offset expressed in hours: -0100 or -01:00
The UTC or GMT abbreviation

You can modify the following properties of a delimited file:
- Field Delimiter—The delimiter for each field. Common delimiters are , and ;.
- Record Terminator—The terminator for each row of data. Common terminators are \n and \t.
- Quote Character—The character used for quotes in the source dataset.
- Has Header Row—A true or false value indicating whether the source dataset includes headers. If a header row is included in the dataset, the headers will be used for the field names.
- Encoding—The encoding type used by the source dataset. The default is UTF-8.
The Update Big Data Connection Dataset Properties tool updates the properties of an individual dataset. Use the following tools to modify a BDC:
- Copy Dataset From Big Data Connection—Copies a dataset from a BDC to a feature class.
- Duplicate Dataset From Big Data Connection—Creates a view of an existing BDC dataset.
- Refresh Big Data Connection—Checks for any new datasets and add them to the BDC.
- Remove Dataset From Big Data Connection—Removes a dataset from the BDC.
- Update Big Data Connection Dataset Properties—Modifies the properties of an individual BDC dataset.
- Preview Dataset From Big Data Connection—Previews the first ten features in your dataset to verify they are correctly registered.
- Describe Dataset—Allows you to confirm that the dataset displays as expected.
You can optionally edit your BDC file manually. You should always modify the .bdc file manually for the following situations:
- You have one or more fields used to represent the x-,y-, or z-location.
- You want to update the source path.
Learn more about big data connection file formatting.
This geoprocessing tool is powered by Spark. See Big data connections to learn more about big data connections and how to use them.

Syntax

UpdateBDCDatasetProperties(bdc_dataset, {expression}, {field_properties}, {geometry_type}, {spatial_reference}, {geometry_format_type}, {geometry_field}, {x_field}, {y_field}, {z_field}, {time_type}, {time_zone}, {start_time_format}, {end_time_format}, {file_extension}, {field_delimiter}, {record_terminator}, {quote_character}, {has_header_row}, {encoding})

Parameter	Explanation	Data Type
bdc_dataset	The BDC dataset to update. The options for editing will differ depending on the source data (shapefile, delimited file, ORC, or parquet file).	Table View
expression (Optional)	An expression used to limit the features that will be used in analysis.	SQL Expression
field_properties [field_properties,...] (Optional)	Specifies the field names and properties to modify. SHORT —The field will be type short. LONG —The field will be type long DOUBLE —The field will be type double. FLOAT —The field will be type float. STRING —The field will be type string. DATE —The field will be type date. BLOB —The field will be type BLOB. Specifies whether fields will be visible or hidden. TRUE —The fields will be visible and available for use in geoprocessing tools. This is the default. FALSE —The fields will be hidden and cannot be used as input to geoprocessing tools.	Value Table
geometry_type (Optional)	Specifies the type of geometry that will be used to spatially represent the dataset. The geometry cannot be modified for shapefile-sourced datasets. POINT —The geometry type is point. LINE —The geometry type is polyline. POLYGON —The geometry type is polygon. NONE —No geometry type.	String
spatial_reference (Optional)	The WKID value or WKT string that will be used for the spatial reference of the dataset. The default is WKID 4326 (WGS84). The spatial reference cannot be modified for shapefile-sourced data.	String
geometry_format_type (Optional)	Specifies how the geometry will be formatted. The geometry cannot be modified for shapefile-sourced data. XYZ —Two or more fields will represent x, y, and optionally z. WKT —The geometry will be represented by a single field in a well-known text field. WKB —The geometry will be represented by a single field in a well-known binary field. GEOJSON —The geometry will be represented by a single field in GeoJSON format. ESRIJSON —The geometry will be represented by a single field in EsriJSON format.	String
geometry_field (Optional)	A single field used to represent the geometry. This field is used when the geometry format is WKT, WKB, GeoJSON, or EsriJSON.	String
x_field (Optional)	The field used to represent the x-location. If you have more than one field representing the x-location, modify the .bdc file manually.	String
y_field (Optional)	The field used to represent the y-location. If you have more than one field representing the y-location, modify the .bdc file manually.	String
z_field (Optional)	The field used to represent the z-location. If you have more than one field representing the z-location, modify the .bdc file manually.	String
time_type (Optional)	Specifies the time type used to temporally represent the dataset. INTERVAL —The time type will represent a duration of time with a start and end time. INSTANT —The time type will represent an instant in time. NONE —Time is not enabled.	String
time_zone (Optional)	The time zone of the dataset.	String
start_time_format [start_time_format,...] (Optional)	The fields used to define the start time and the time formatting.	Value Table
end_time_format [end_time_format,...] (Optional)	The fields used to define the end time and the time formatting.	Value Table
file_extension (Optional)	The file extension of the source dataset. The parameter value cannot be modified.	String
field_delimiter (Optional)	The field delimiter used in the source dataset.	String
record_terminator (Optional)	The record terminator used in the source dataset.	String
quote_character (Optional)	The quote character used in the source dataset.	String
has_header_row (Optional)	Specifies whether the source dataset includes a header row. HAS_HEADER —The source dataset includes a header row. NO_HEADER —The source dataset does not include a header row.	Boolean
encoding (Optional)	The type of encoding used by the source dataset. By default UTF-8 is used.	String

Derived Output

Name	Explanation	Data Type
updated_bdc	The updated BDC file with edited properties applied to the specified dataset.	File

Code sample

UpdateBDCDatasetProperties example (stand-alone script)

The following Python script demonstrates how to use the UpdateBDCDatasetProperties function.

# Name: UpdateBDCDatasetProperties.py
# Description: Add a filter and modify the schema, time, and geometry for a BDC dataset
# Requirements: ArcGIS Pro Advanced License

# Import system modules
import arcpy

# Set local variables
dataset = r"c:\Projects\MyProjectFolder\my_BigDataConnection.bdc\myBigDataset"
filter = "COUNT > 500"
field_properties = "Field1 FLOAT true;Field2 STRING true;Field3 DOUBLE true"
geometry_type = "POINT"
sref = "4326"
geometry_format = "XYZ"
x_field = "Long"
y_field = "Lat"
z_field = ""
time_type = "INSTANT"
time_zone = "UTC"
time_formats = "Year yyyy"
file_extenstion = "csv"
file_delimitor = ","
file_terminator = r"\n"
file_quotechar = '"'
has_header_row = True
file_encoding = "UTF-8"


# Execute Update BDC Dataset Properties
arcpy.gapro.UpdateBDCDatasetProperties(dataset, filter, field_properties, geometry_type, sref, geometry_format, "",
x_field, y_field, z_field, time_type, time_zone, time_formats, None, file_extension, file_delimitor, file_terminator, 
file_quotechar, has_header_row, file_encoding)

Environments

This tool does not use any geoprocessing environments.

Licensing information

Basic: No
Standard: No
Advanced: Yes