Update Big Data Connection Dataset Properties (GeoAnalytics Desktop)

Summary

Updates the properties of a big data connection (BDC) dataset. This tool modifies field, geometry, time, and file settings for a specified BDC dataset.

Usage

  • This tool requires a BDC. To create a BDC, use the Create Big Data Connection tool.

  • Use this tool to modify BDC dataset schema, geometry, or time for use in analysis or visualization in scenarios such as the following:

    • Your CSV dataset was registered with all string type fields and you want to set the fields as numeric for use in analysis.
    • Your BDC dataset has attribute values for two separate locations, such as taxi pick-up and taxi drop-off spots, and you want to change the geometry you use for analysis.
    • Your workflow requires that time is set on the input layer.
    • You want to share a BDC dataset with a colleague who is only interested in a subset of features, so you add a definition query expression and hide some unused fields.

  • You can modify the following properties:

    • Definition query—An expression used to limit the features used in analysis.
    • Fields—The field name, field type, and visibility.
    • Geometry—How geometry is represented. These are not editable for shapefiles.
    • Time—How time is represented.
    • File—The file properties used to read the dataset.

  • Specify the BDC dataset with the properties you want to modify using the Big Data Connection Dataset parameter. You can browse to the dataset or specify it using a pathway such as c:\<path>\MyBDC.bdc\<dataset_name>, for example, c:\MyBDCFolder\MyBDC.bdc\earthquakes_dataset.

  • Define an expression to limit the features used in analysis using the Expression parameter. Adding a filter to a BDC dataset is similar to applying a definition query to a dataset in your map: specify a SQL expression to filter features of interest.

  • You can update the field type for delimited files. You cannot update the field type for other data sources (such as shapefile, ORC or parquet files).

  • You can modify the geometry for delimited files, ORC, and parquet files. You cannot modify the geometry for a shapefile-sourced dataset.

  • The following table outlines how to specify time formats for the Start Time and End Time parameters when you edit a BDC dataset. The examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.

    Time formats in big data connections

    SymbolMeaningExample

    yy

    The year, represented by two digits.

    16

    yyyy

    The year, represented by four digits.

    2016

    MM

    The month, represented numerically.

    01 or 1

    MMM

    The month, represented using three letters.

    Jan

    MMMM

    The month, represented using the complete spelling.

    January

    dd

    The day.

    02 or 2

    HH

    The hour when using a 24-hour day; values range from 0-23.

    21

    hh

    The hour when using a 12-hour day; values range from 1-12.

    9

    mm

    The minute; values range from 0-59.

    45

    ss

    The second; values range from 0-59.

    02

    SSS

    The millisecond; values range from 0-999.

    50

    a

    The AM/PM marker.

    PM

    epoch_millis

    The time in milliseconds from epoch.

    1509581781000

    epoch_seconds

    The time in seconds from epoch.

    1509747601

    Z

    The time zone offset expressed in hours.

    -0100 or -01:00

    ZZZ

    The time zone offset expressed using IDs.

    America/Los_Angeles

    ''

    Use single quotes to add text that doesn't represent a value outlined in this table.

    'T'

    The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:

    Time format examples

    Input dateDate format

    01/02/2016 9:45:02PM

    MM/dd/yyyy hh:mm:ssa

    Jan02-16 21:45:02

    MMMdd-yy HH:mm:ss

    January 02 2016 9:45:02.050PM

    MMMM dd yyyy hh:mm:ss.SSSa

    01/02/2017T9:45:14:05-0000

    MM/dd/yyyy'T'HH:mm:ssZ

    You can specify the time zone using one of the following:
    • The full name of the time zone: Pacific Standard Time
    • The time zone offset expressed in hours: -0100 or -01:00
    • The UTC or GMT abbreviation
  • You can modify the following properties of a delimited file:

    • Field Delimiter—The delimiter for each field. Common delimiters are , and ;.
    • Record Terminator—The terminator for each row of data. Common terminators are \n and \t.
    • Quote Character—The character used for quotes in the source dataset.
    • Has Header Row—A true or false value indicating whether the source dataset includes headers. If a header row is included in the dataset, the headers will be used for the field names.
    • Encoding—The encoding type used by the source dataset. The default is UTF-8.

  • The Update Big Data Connection Dataset Properties tool updates the properties of an individual dataset. Use the following tools to modify a BDC:

  • You can optionally edit your BDC file manually. You should always modify the .bdc file manually for the following situations:

    • You have one or more fields used to represent the x-,y-, or z-location.
    • You want to update the source path.

    Learn more about big data connection file formatting.

  • This geoprocessing tool is powered by Spark. See Big data connections to learn more about big data connections and how to use them.

Syntax

arcpy.gapro.UpdateBDCDatasetProperties(bdc_dataset, {expression}, {field_properties}, {geometry_type}, {spatial_reference}, {geometry_format_type}, {geometry_field}, {x_field}, {y_field}, {z_field}, {time_type}, {time_zone}, {start_time_format}, {end_time_format}, {file_extension}, {field_delimiter}, {record_terminator}, {quote_character}, {has_header_row}, {encoding})
ParameterExplanationData Type
bdc_dataset

The BDC dataset to update. The options for editing will differ depending on the source data (shapefile, delimited file, ORC, or parquet file).

Table View
expression
(Optional)

An expression used to limit the features that will be used in analysis.

SQL Expression
field_properties
[field_properties,...]
(Optional)

Specifies the field names and properties to modify.

  • SHORTThe field will be type short.
  • LONGThe field will be type long
  • DOUBLEThe field will be type double.
  • FLOATThe field will be type float.
  • STRINGThe field will be type string.
  • DATEThe field will be type date.
  • BLOBThe field will be type BLOB.

Specifies whether fields will be visible or hidden.

  • TRUEThe fields will be visible and available for use in geoprocessing tools. This is the default.
  • FALSEThe fields will be hidden and cannot be used as input to geoprocessing tools.
Value Table
geometry_type
(Optional)

Specifies the type of geometry that will be used to spatially represent the dataset. The geometry cannot be modified for shapefile-sourced datasets.

  • POINTThe geometry type is point.
  • LINEThe geometry type is polyline.
  • POLYGONThe geometry type is polygon.
  • NONENo geometry type.
String
spatial_reference
(Optional)

The WKID value or WKT string that will be used for the spatial reference of the dataset. The default is WKID 4326 (WGS84). The spatial reference cannot be modified for shapefile-sourced data.

String
geometry_format_type
(Optional)

Specifies how the geometry will be formatted. The geometry cannot be modified for shapefile-sourced data.

  • XYZTwo or more fields will represent x, y, and optionally z.
  • WKTThe geometry will be represented by a single field in a well-known text field.
  • WKBThe geometry will be represented by a single field in a well-known binary field.
  • GEOJSONThe geometry will be represented by a single field in GeoJSON format.
  • ESRIJSONThe geometry will be represented by a single field in EsriJSON format.
String
geometry_field
(Optional)

A single field used to represent the geometry. This field is used when the geometry format is WKT, WKB, GeoJSON, or EsriJSON.

String
x_field
(Optional)

The field used to represent the x-location. If you have more than one field representing the x-location, modify the .bdc file manually.

String
y_field
(Optional)

The field used to represent the y-location. If you have more than one field representing the y-location, modify the .bdc file manually.

String
z_field
(Optional)

The field used to represent the z-location. If you have more than one field representing the z-location, modify the .bdc file manually.

String
time_type
(Optional)

Specifies the time type used to temporally represent the dataset.

  • INTERVALThe time type will represent a duration of time with a start and end time.
  • INSTANTThe time type will represent an instant in time.
  • NONETime is not enabled.
String
time_zone
(Optional)

The time zone of the dataset.

String
start_time_format
[start_time_format,...]
(Optional)

The fields used to define the start time and the time formatting.

Value Table
end_time_format
[end_time_format,...]
(Optional)

The fields used to define the end time and the time formatting.

Value Table
file_extension
(Optional)

The file extension of the source dataset. The parameter value cannot be modified.

String
field_delimiter
(Optional)

The field delimiter used in the source dataset.

String
record_terminator
(Optional)

The record terminator used in the source dataset.

String
quote_character
(Optional)

The quote character used in the source dataset.

String
has_header_row
(Optional)

Specifies whether the source dataset includes a header row.

  • HAS_HEADERThe source dataset includes a header row.
  • NO_HEADERThe source dataset does not include a header row.
Boolean
encoding
(Optional)

The type of encoding used by the source dataset. By default UTF-8 is used.

String

Derived Output

NameExplanationData Type
updated_bdc

The updated BDC file with edited properties applied to the specified dataset.

File

Code sample

UpdateBDCDatasetProperties example (stand-alone script)

The following Python script demonstrates how to use the UpdateBDCDatasetProperties function.

# Name: UpdateBDCDatasetProperties.py
# Description: Add a filter and modify the schema, time, and geometry for a BDC dataset
# Requirements: ArcGIS Pro Advanced License

# Import system modules
import arcpy

# Set local variables
dataset = r"c:\Projects\MyProjectFolder\my_BigDataConnection.bdc\myBigDataset"
filter = "COUNT > 500"
field_properties = "Field1 FLOAT true;Field2 STRING true;Field3 DOUBLE true"
geometry_type = "POINT"
sref = "4326"
geometry_format = "XYZ"
x_field = "Long"
y_field = "Lat"
z_field = ""
time_type = "INSTANT"
time_zone = "UTC"
time_formats = "Year yyyy"
file_extenstion = "csv"
file_delimitor = ","
file_terminator = r"\n"
file_quotechar = '"'
has_header_row = True
file_encoding = "UTF-8"


# Execute Update BDC Dataset Properties
arcpy.gapro.UpdateBDCDatasetProperties(dataset, filter, field_properties, geometry_type, sref, geometry_format, "",
x_field, y_field, z_field, time_type, time_zone, time_formats, None, file_extension, file_delimitor, file_terminator, 
file_quotechar, has_header_row, file_encoding)

Environments

This tool does not use any geoprocessing environments.

Licensing information

  • Basic: No
  • Standard: No
  • Advanced: Yes

Related topics