Describe Dataset (GeoAnalytics Desktop)

Summary

Summarizes features into calculated field statistics, sample features, and extent boundaries.

Illustration

Describe Dataset workflow diagram

Usage

  • The following are examples of what you can do with the Describe Dataset tool:

    • Verify that you have correctly registered time and geometry with your big data file share.
    • Understand attribute values with summarized field statistics.
    • Visualize your big data with a sample layer. Instead of drawing a million features, draw a sample.
    • Run workflows using a sample of the data before scaling out for longer and larger processing.
    • Determine where a dataset is by calculating the geographical extent.

  • By default, the tool outputs a table containing the summary statistics for each of the fields in the input layer. In addition, a table is printed to the geoprocessing window describing any geometry or time properties of the input layer.

    If the input layer has geometry, the tool prints a table describing the following geometry properties of the input layer:

    • Geometry type—The geometry type of the input layer. This value is point, line, or polygon.
    • Spatial reference—The spatial reference of the input layer.
    • Count of non-empty features—The number of features that have a valid geometry within the extent of the spatial reference of the input layer.
    • Count of empty features—The number of features that do not have a valid geometry. These features may have an empty geometry, or the geometry may be outside of the extent of the spatial reference being used.
    • Spatial extent—The spatial extent of the features in the input layer.

    If the input layer has time enabled, the tool prints a table describing the following time properties of the input layer:

    • Time type—The time type of the input layer. This value is instant or interval.
    • Count of non-empty features—The number of features that have a valid time value.
    • Count of empty features—The number of features that have a null or invalid time value.
    • Temporal extent—The temporal extent of the features in the input layer. This value contains a start time and end time.

  • Use the Number of Sample Features parameter to specify the number of features to sample. If you leave it blank or select 0, no sample will be created. The output subset will have the same schema, geometry, and time settings as the input features. The subset can be used to understand how your datasets appear when added to a map or visualized in an attribute table. Additionally, you can run analysis on the subset to determine the best inputs for larger analysis.

  • If you specify a sample size greater than the total number of input features, all features will be returned.

  • The sample layer does not represent a truly random geographic selection and should not be used to understand the geographic extent or distribution of your data. For example, if you specify 230 features for Number of Sample Features, the result can contain 230 input features in any order or location.

  • Create a boundary feature that describes the extent of your input dataset using the Extent Layer output parameter. The output will include a single polygon feature representing the geographic extent of the input features. The extent layer can be used to determine where your data is stored, or use it as an input elsewhere in your workflow. For example, use it as the polygon layer to clip features to using the GeoAnalytics Clip Layer tool.

  • You only have the option to create an extent layer for point, line, and polygon features. An extent layer will not be created for tabular features.

  • Optionally use environment settings to specify how features will be output.

    For example, use the Extent environment to output an extent layer representing the area of interest, or sample features from the defined study area.

    Additionally use the Output Coordinate System environment to project outputs to the desired spatial reference.

  • You can improve the performance of the Describe Dataset tool by doing the following:

    • Set the extent of the data so you only analyze the data of interest.
    • Generate fewer sample features.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by Spark. Analysis is completed on your desktop machine using multiple cores in parallel. See Considerations for GeoAnalytics Desktop tools to learn more about running analysis.

  • When running GeoAnalytics Desktop tools, the analysis is completed on your desktop machine. For optimal performance, data should be available on your desktop. If you are using a hosted feature layer, it is recommended that you use ArcGIS GeoAnalytics Server. If your data isn't local, it will take longer to run a tool. To use your ArcGIS GeoAnalytics Server to perform analysis, see GeoAnalytics Tools.

Syntax

arcpy.gapro.DescribeDataset(input_layer, output, {sample_features}, {sample_layer}, {extent_layer})
ParameterExplanationData Type
input_layer

The point, line, polygon, or tabular features to be described.

Table View
output

A new table with the summary information.

Table
sample_features
(Optional)

The number of features that will be included in the output sample layer. No sample is returned if you select 0 features or don't provide a number. By default, no sample layer is returned.

Long
sample_layer
(Optional)

A new feature class with a sample of the input data.

Table; Feature Class
extent_layer
(Optional)

A new feature class with the spatial and temporal extent of the input data.

Feature Class

Code sample

DescribeDataset example (Python window)

The following Python window script demonstrates how to use the DescribeDataset tool.

In this script, network features are described and a sample layer of 2500 features is created.

#-------------------------------------------------------------------------------
# Name: DescribeDataset.py
# Description: 

# Import system modules
import arcpy
arcpy.env.workspace = "C:/data/RedRiver_basin.gdb"

# Set local variables
inputDataset = "WaterSample"
output = "WSample_summary"
sample = "WSample_sample2500"

# Execute Describe Dataset
arcpy.gapro.DescribeDataset(inputDataset, output, 2500, sample)

Licensing information

  • Basic: No
  • Standard: No
  • Advanced: Yes

Related topics