Summary
Summarizes features into calculated field statistics, sample features, and extent boundaries.
Illustration
Usage
The following are examples of what you can do with the Describe Dataset tool:
- Verify that you have correctly registered time and geometry with your big data file share.
- Understand attribute values with summarized field statistics.
- Visualize your big data with a sample layer. Instead of drawing a million features, draw a sample.
- Run workflows using a sample of the data before scaling out for longer and larger processing.
- Determine where a dataset is by calculating the geographical extent.
The tool will output both a table containing summary statistics for each field and a JSON describing the properties of the input layer by default.
Use the Number of Sample Features parameter to specify the number of features to sample. If you leave it blank or select 0, no sample will be created. The output subset will have the same schema, geometry, and time settings as the input features. The subset can be used to understand how your datasets appear when added to a map or visualized in an attribute table. Additionally, you can run analysis on the subset to determine the best inputs for larger analysis.
If you specify a sample size greater than the total number of input features, all features will be returned.
The sample layer does not represent a truly random geographic selection and should not be used to understand the geographic extent or distribution of your data. For example, if you specify 230 features for Number of Sample Features, the result can contain 230 input features in any order or location.
Create a boundary feature that describes the extent of your input dataset using the Extent Layer output parameter. The output will include a single polygon feature representing the geographic extent of the input features. The extent layer can be used to determine where your data is stored, or use it as an input elsewhere in your workflow. For example, use it as the polygon layer to clip features to using the GeoAnalytics Clip Layer tool.
You only have the option to create an extent layer for point, line, and polygon features. An extent layer will not be created for tabular features.
Optionally use environment settings to specify how features will be output.
For example, use the Extent environment to output an extent layer representing the area of interest, or sample features from the defined study area.
Additionally use the Output Coordinate System environment to project outputs to the desired spatial reference.
You can improve the performance of the Describe Dataset tool by doing the following:
- Set the extent of the data so you only analyze the data of interest.
- Generate fewer sample features.
- Use data that is local to where the analysis is being run.
This geoprocessing tool is powered by Spark. Analysis is completed on your desktop machine using multiple cores in parallel. See Considerations for GeoAnalytics Desktop tools to learn more about running analysis.
When running GeoAnalytics Desktop tools, the analysis is completed on your desktop machine. For optimal performance, data should be available on your desktop. If you are using a hosted feature layer, it is recommended that you use ArcGIS GeoAnalytics Server. If your data isn't local, it will take longer to run a tool. To use your ArcGIS GeoAnalytics Server to perform analysis, see GeoAnalytics Tools.
Syntax
DescribeDataset(input_layer, output, {sample_features}, {sample_layer}, {extent_layer})
Parameter | Explanation | Data Type |
input_layer | The point, line, polygon, or tabular features to be described. | Table View |
output | A new table with the summary information. | Table |
sample_features (Optional) | The number of features that will be included in the output sample layer. No sample is returned if you select 0 features or don't provide a number. By default, no sample layer is returned. | Long |
sample_layer (Optional) |
A new feature class with a sample of the input data. | Table; Feature Class |
extent_layer (Optional) |
A new feature class with the spatial and temporal extent of the input data. | Feature Class |
Code sample
The following Python window script demonstrates how to use the DescribeDataset tool.
In this script, network features are described and a sample layer of 2500 features is created.
#-------------------------------------------------------------------------------
# Name: DescribeDataset.py
# Description:
# Import system modules
import arcpy
arcpy.env.workspace = "C:/data/RedRiver_basin.gdb"
# Set local variables
inputDataset = "WaterSample"
output = "WSample_summary"
sample = "WSample_sample2500"
# Execute Describe Dataset
arcpy.gapro.DescribeDataset(inputDataset, output, 2500, sample)
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes