Big data connection specification

Big data connections (BDC) are created using the New Big Data Connection dialog box or the Create Big Data Connection tool. The New Big Data Connection dialog box and tool generate a BDC item that you can browse to and use in geoprocessing tools. BDC item details are stored in a .bdc file. BDC details include the location of the data and information about each dataset.

It is recommended that you inspect and modify your BDC datasets to ensure that they accurately represent your data. To modify a BDC, use the dialog box or BDC tools. In some cases, it may be appropriate to modify the file manually, for example, to update the source data path or to add complex geometry formatting. When modifying the file, the following is recommended:

  • Back up your current .bdc file in case you want to revert your changes.
  • Validate your updated .bdc file using JSON validation, freely available online.

Use a text editor to modify a .bdc file. The specification of the file is outlined below:

"connection" : {}
"datasets":[]

The connection includes the type and properties. Properties specify the path to the source folder. To move your data, you can update the properties path.

"connection" : {
  "type": "filesystem",
  "properties":{
      "path": <path to source folder>
      }
}

The datasets include one or more datasets in your BDC. The number of datasets depends on the number of folders your BDC contains. In the following example, there are five datasets:

"datasets":[
  {.. dataset1 ..},
  {.. dataset2 ..},
  {.. dataset3 ..},
  {.. dataset4 ..},
  {.. dataset5 ..},
]

In each dataset, there are five top-level objects that may be applicable. Of these objects, name, format, and schema are required.

{
 "name": <dataset name>,
 "alias": <alias name>,
 "sourceName": <source name>,
 "filter": <where clause>,
 "properties": {},
 "fields": {},
 "geometry": {},
 "time": {}
}

Name

The name object is required and defines the name of the dataset. The name must be unique within the manifest.

Alias

The alias object is optional and is an alternative name for the dataset that is more descriptive and can contain characters that are restricted in the name object. The alias must be either the same as the dataset name or unique within the .bdc file.

Source name

The sourceName object is optional when the name object matches the source folder name. When the name is not the same as the source folder, the sourceName object must be included and match the folder name exactly. The sourceName object allows you to create multiple datasets with unique names using the same source folder. For example, if you have a dataset named taxis—with sourceName as taxis, you can have datasets named taxi_pickup and taxi_dropoff with different geometries and time formatting, These datasets have the same source name and different names. All three datasets represent the same original dataset, taxis.

Filter

The filter object is optional and applies a SQL expression to the dataset. Only features that meet the filter condition are used. The filter impacts drawing, which features appear in the layer's attribute table, can be selected, labeled, identified, and processed by geoprocessing tools. For example, the filter "X IS NOT NULL AND Y IS NOT NULL" will only use features where the x and y fields are not null.

Properties

The properties object is required and defines the dataset type and its format.

Syntax

"properties" : {
 "fileformat" :  "< delimited | shapefile | orc | parquet >",
 "delimited.extension" : "< csv | tsv | txt | other >",
 "delimited.fieldDelimiter" : "< delimiter >",
 "delimited.recordTerminator: "< terminator >",
 "delimited.quoteChar":  "< character for quotes>",
 "delimited.escapeChar":  "< character for escape>",
 "delimited.hasHeaderRow" :  < true | false >, 
 "delimited.encoding" : "< encoding format >"
}

Examples

The following is an example using a shapefile:

"format" : {
 "type": "shapefile",
}

The following is an example using a delimited file:

"format" : {
 "type": "delimited",
 "delimited.extension": "csv",
 "delimited.fieldDelimiter": ",",
 "delimited.recordTerminator": "\n", 
 "delimited.quoteChar" "\"",
 "delimited.escapeChar" "\"",
 "delimited.hasHeaderRow": true,
 "delimited.encoding" : "UTF-8"
}

Description

  • type—A required property that defines the source data. This can be delimited, shapefile, parquet, or orc.
  • The remaining objects are only specified for delimited files and are required:
    • delimited.extension—A required property that denotes the file extension (for example, csv or tsv).
    • delimited.quoteChar—Denotes how quotes are specified in the delimited file.
    • delimited.escapeChar—Denotes how backslashes are specified in the delimited file.
    • delimited.encoding—Specifies the type of encoding used.
    • delimited.recordTerminator—Specifies what terminates features in the delimited file.
    • delimited.fieldDelimiter—Denotes what separates fields in the delimited file.
    • delimited.hasHeaderRow—Specifies whether the first row in a delimited file will be treated as a header or as the first feature.

Fields

The fields object is required; it defines the dataset fields, field types, and visibility.

Syntax

"fields" : [{
  "name": <fieldName>,
  "sourceName": <field name in source>,
  "type" : < Int8 | Int16 | Int32 | Int64 | Float32 | Float64 | String | 
     Binary | Date >,
  "visible" : <true | false>
 },
 {...field 2...}, 
 {...field 3...}
 ...
 {...field n...}
}

Example

"fields" : {
  {
   "name": "trackid",
   "type": "String"
  },
  {
   "name": "x",
   "type": "Float32",
   "visible" : false
  },
  {
   "name": "y",
   "type": "Float32",
   "visible" : false
  },
  {
   "name": "time",
   "type": "Int64",
   "visible" : false
  },
  {
   "name": "value",
   "type": "Float64"
  }
 ]
}

Description

  • name—A required property that denotes the field name. The field name must be unique to the dataset and can only contain alphanumeric characters and underscores.
  • sourceName—An optional property that denotes the field name in the source dataset. This is only required if the name doesn't match the folder name.
  • visible—An optional property denotes whether the field will be visible in geoprocessing tools. By default, fields that are initially set as time and geometry fields using the Create Big Data Connection or Refresh Big Data Connection tool have visibility set to false. All other fields are set to true by default.
  • type—A required property that denotes the type of field. Options include the following:
    • Int8—Represented in ArcGIS Pro as a short field.
    • Int16—Represented in ArcGIS Pro as a short field.
    • Int32—Represented in ArcGIS Pro as a long field.
    • Int64—Represented in ArcGIS Pro as a double field.
    • Float32—Represented in ArcGIS Pro as a float field.
    • Float64—Represented in ArcGIS Pro as a double field.
    • String—Represented in ArcGIS Pro as a string field.
    • Binary—Represented in ArcGIS Pro as a BLOB field. Only parquet and ORC inputs can include binary values.
    • Date—Represented in ArcGIS Pro as a date field. Only shapefiles, ORC and parquet datasets can have date fields.

Geometry

The geometry object is optional; however, it's required if a dataset includes a spatial representation, such as a point, polyline, or polygon.

Syntax

"geometry" : {
 "geometryType" : "< esriGeometryType >",
 "spatialReference" : {
 <spatial reference JSON>
  },
 "fields": [
 {
  "name": "<fieldName1>",
  "formats": ["<fieldFormat1>"]
 },
 {
  "name": "<fieldName2>",
  "formats": ["<fieldFormat2>"]
 }
 ]
}

Examples

The following is an example using a delimited file with x- and y-values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid" : 3369
 },
 "fields": [
 {
  "name": "Longitude",
  "formats": ["x"]
 },
 {
  "name": "Latitude",
  "formats": ["y"]
 }
 ]
}

The following is an example using a delimited file with x-, y-, and z-values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkt" : "GEOGCS[\"GCS_WGS_1984_Perfect_Sphere\",DATUM[\"D_Sphere\",SPHEROID[\"Sphere\",6371000.0,0.0]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]"
 },
 "fields": [
 {
  "name": "Longitude",
  "formats": ["x"]
 },
 {
  "name": "Latitude",
  "formats": ["y"]
 },
 {
  "name": "Height",
  "formats": ["z"]
 }
 ]
}

The following is an example using a .tsv file:

"geometry" : {
 "geometryType" : "esriGeometryPolygon",
 "spatialReference" : {
  "wkid": 4326
 },
 "fields": [
 {
  "name": "Shapelocation",
  "formats": ["WKT"]
 }
 ]
}

The following is an example using a delimited file with x-values that are in a formatted field, and y-values are across multiple fields:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid": 3857
 },
 "fields": [
 {
  "name": "XValue",
  "formats": ["{x:degrees}° {x:minutes}' {x:seconds}" ]
 },
 {
  "name": "YDegrees",
  "formats": ["{y:degrees}"]
 },
 {
  "name": "YMinutes",
  "formats": ["{y:minutes}"]
 },
 {
  "name": "YSeconds",
  "formats": ["{y:seconds}"]
 }
 ]
}

Description

Since the geometry object is optional, the following properties are listed as required or optional, assuming that geometry is used:

  • geometryType—A required property that denotes the geometry type. Options include the following:
    • esriGeometryPoint—The geometry type is point.
    • esriGeometryPolyline—The geometry type is polyline.
    • esriGeometryPolygon—The geometry type is polygon.
  • spatialReference—A required property denoting the spatial reference of the dataset. If the dataset has geometry, either one or both of the WKID must be specified (WKID and latest WKID) or the WKT.
    • wkid—The spatial reference using a WKID, for example, 4326.
    • latestWkid—The spatial reference at a given software release.
    • wkt—The spatial reference using a well-known text string.
  • fields—A required property for delimited datasets with a spatial representation. This denotes the field name or names and formats of the geometry.
    • name—A required property for delimited datasets with a spatial representation. This denotes the name of the geometry field. There can be multiple instances of this.
    • formats—A required property for delimited datasets with a spatial representation. This denotes the format of the field used to represent the geometry. There can be multiple instances of this. If your location is spread across multiple fields or if formatted in a string use the values or degrees, minutes, seconds to specify the units or direction to specify the direction (N, S, W, E).

Time

The time object is optional; however, it is required if a dataset has a temporal representation.

Syntax

"time" : {
 "timeType" : "< instant | interval >",
 "timeReference" : {
  "timeZone" : "<timeZone >"
  },
  "fields": [
  {
   "name": "<fieldName1>",
   "formats": ["<fieldFormat1>"]
   "role": "< start | end >"
  }
 ]
}

Examples

The following is an example using an instant with multiple formats in the time fields:

"time": {
 "timeType": "instant",
 "timeReference": {"timeZone": "UTC"},
 "fields": [
 {
  "name": "iso_time",
  "formats": [
   "yyyy-MM-dd HH:mm:ss",
   "MM/dd/yyyy HH:mm"
   ]
  }
 ]
}

The following is an example using an interval with multiple startTime fields:

"time": {
 "timeType": "interval",
 "timeReference": {"timeZone": "-0900"},
 "dropSourceFields" : true,
 "fields": [
 {
  "name": "time_start",
  "formats": ["HH:mm:ss"],
  "role" : "start"
  },
 {
  "name": "date_start",
  "formats": ["yyyy-MM-dd"],
  "role" : "start"
  },
 {
  "name": "datetime_ending",
  "formats": ["yyyy-MM-dd HH:mm:ss"],
  "role" : "end"
  }
 ]
}

Description

Since the time object is optional, the following properties are listed as required or optional, assuming that time is used:

  • timeType—A required property if there is time included in the dataset. Options include the following:
    • instant—A single moment in time
    • interval—A time interval with a start and end time
  • timeReference—A required property if the dataset is time enabled, denoting the time zone (timeZone).
    • timeZone—A required property that denotes the time zone format of the data. Time zones are based on Joda-Time. To learn about Joda-Time formats, see Joda-Time Available Time Zones. The timeZone property can be formatted as follows:
      • The full name of the time zone: Pacific Standard Time.
      • The time zone offset expressed in hours: -0100 or -01:00.
      • Time zone abbreviations: UTC or GMT only.
  • fields—A required field that denotes the field names and formats of the time. The required properties of fields are as follows:
    • name—A required property that denotes the name of the field used to represent time. There can be multiple instances of this object.
    • formats—A required property that denotes the format of the field used to represent the time. There can be multiple formats for a single field (as shown above) as well as multiple instances of this object. To learn how time fields can be formatted, see Time formats. When the time format includes the time reference, set the timeReference property to UTC.
    • role—A required property when timeType is interval. It can represent either the startTime or endTime of a time interval.

Related topics