Select Random Sample (Data Reviewer)

Available with Data Reviewer license.

Summary

Selects a random sample of the input features or rows based on the specified sampling method.

The output is a selection made on the input layer in the map frame. The tool can also create a .json file that records the selected object IDs (OIDs), and the SQL expression used for the selection. The selection can be used for the Browse Features visual review tool and the Run Data Checks tool workflows.

Usage

  • The Sample Method parameter has the following options:

    • Fixed Number—The number of records selected will be based on the Number of Records parameter value.
    • Percentage—The number of records selected will be based on the Percentage of Records parameter value.
    • Auto Calculate—The number of records selected will be based on a calculation using the Confidence Level and Margin of Error parameter values.

  • The Sample Method parameter's Auto Calculate option uses the following variables to calculate the number of records:

    z=scipy.stats.norm.ppf(1-(1-confidence_level)/2)
    n=((z/m)^2)*(p*(1-p))
    n'=(n*N)/(n+(N-1))
    • The z-statistic for the desired confidence level (z). The z-statistic is calculated using the confidence level variable and the scipy.stats module z=scipy.stats.norm.ppf(1-(1-confidence_level)/2).
    • The acceptable margin of error in the confidence interval (m).
    • The probability (p) is highest at 0.5 because there is no past knowledge about whether a certain percentage of records will pass or fail. Since the chances of records passing or failing are equal, 0.5 is the most conservative value to use in the variance equation.
    • The population size (N) is the total number of records in a feature layer or table.

  • Random OIDs are selected using the random Python module random.sample(population, k) in which population is the list of the OID values, and k is the size of the sample.

  • The output of this tool is a random selection of records from the Input Rows parameter value based on the Sample Method parameter value.

  • Use the optional Output File parameter to create a .json file that includes the following:

    • The date and time the tool was run
    • The workspace the input is sourced from
    • The name of the input feature layers or tables
    • The total number of selected records
    • The OIDs of the selected records
    • The SQL expression that was used to make the selection

  • All selections made in the Input Rows parameter will be implemented, regardless of whether the Use the selected records toggle button is turned off.

  • The feature layer or table must have an ObjectID field before running this tool.

  • If the Use the selected records toggle button is turned off, the Output File parameter value will record a random selection of features based on the entire dataset. However, if there is a definition query applied, only the features or rows matching the query will be selected in the map frame.

Parameters

LabelExplanationData Type
Input Rows

The data to which the selection will be applied.

Feature Layer; Table View
Sample Method

Specifies the sampling method that will be used.

  • Fixed NumberThe number of records selected will be based on the number of records parameter value.
  • PercentageThe number of records selected will be based on the percentage of records parameter value.
  • Auto CalculateThe number of records selected will be based on a calculation using the confidence level and margin of error parameter values.
String
Number of Records
(Optional)

The number of records that will be selected.

This parameter is active when the Sample Method parameter value is Fixed Number.

Long
Percentage of Records
(Optional)

The percentage of records in the input that will be selected.

This parameter is active when the Sample Method parameter value is Percentage.

Long
Confidence Level
(Optional)

The level of confidence is the likelihood that a sample size is statistically significant, entered as a percentage such as 98 or 95.

This parameter will be used to calculate the z-statistic (z).

The z-statistic can be calculated using the scipy.stats module z=scipy.stats.norm.ppf(1-(1-confidence_level)/2).

This parameter is active when the Sample Method parameter value is Auto Calculate.

Long
Margin of Error
(Optional)

The acceptable margin of error in the confidence level, entered as a percentage such as 8 or 5.

This parameter uses the calculated z-statistic (z) to calculate the actual sample size (n') using the following equations: n=((z/m)^2)*(p*(1-p)) to n'=(n*N)/(n+(N-1)).

This parameter is active when the Sample Method parameter value is Auto Calculate.

Long
Output File
(Optional)

The output .json file that will contain a record of the selected data.

File

Derived Output

LabelExplanationData Type
Updated Rows

The updated input with the selections applied.

Feature Layer; Table View

arcpy.Reviewer.SelectRandomSample(in_layer_or_view, sample_method, {number_of_records}, {percentage_of_records}, {confidence_level}, {margin_of_error}, {out_file})
NameExplanationData Type
in_layer_or_view

The data to which the selection will be applied.

Feature Layer; Table View
sample_method

Specifies the sampling method that will be used.

  • FIXED_NUMBERThe number of records selected will be based on the number of records parameter value.
  • PERCENTAGEThe number of records selected will be based on the percentage of records parameter value.
  • AUTO_CALCULATEThe number of records selected will be based on a calculation using the confidence level and margin of error parameter values.
String
number_of_records
(Optional)

The number of records that will be selected.

This parameter is enabled when the sample_method parameter value is FIXED_NUMBER.

Long
percentage_of_records
(Optional)

The percentage of records in the input that will be selected.

This parameter is enabled when the sample_method parameter value is PERCENTAGE.

Long
confidence_level
(Optional)

The level of confidence is the likelihood that a sample size is statistically significant, entered as a percentage such as 98 or 95.

This parameter will be used to calculate the z-statistic (z).

The z-statistic can be calculated using the scipy.stats module z=scipy.stats.norm.ppf(1-(1-confidence_level)/2).

This parameter is enabled when the sample_method parameter value is AUTO_CALCULATE.

Long
margin_of_error
(Optional)

The acceptable margin of error in the confidence level, entered as a percentage such as 8 or 5.

This parameter uses the calculated z-statistic (z) to calculate the actual sample size (n') using the following equations: n=((z/m)^2)*(p*(1-p)) to n'=(n*N)/(n+(N-1)).

This parameter is enabled when the sample_method parameter value is AUTO_CALCULATE.

Long
out_file
(Optional)

The output .json file that will contain a record of the selected data.

File

Derived Output

NameExplanationData Type
out_layer_or_view

The updated input with the selections applied.

Feature Layer; Table View

Code sample

SelectRandomSample example 1 (Python window)

The following Python window script demonstrates how to use the SelectRandomSample function.

import arcpy
arcpy.env.workspace = r"C:\USAData\Data.gdb"
arcpy.SelectRandomSample_Reviewer("Cities", "FIXED_NUMBER", number_of_records = 35, out_file = "C:\\USAData\\Cities_Sample.json")
SelectRandomSample example 2 (stand-alone script)

The following stand-alone script creates a random selection of features within the Cities feature layer.

# Name: SelectRandomSample_Example.py
# Description: Use the SelectRandomSample tool in ArcGIS Pro to select a random sample of features from a feature class.

# Import system modules
import arcpy

# Set environment workspace
arcpy.env.workspace = r"C:\USAData\Data.gdb"

# Set local variables
in_layer_or_view = "Cities"
sampling_method = "AUTO_CALCULATE"
confidence_level = 98
margin_of_error = 5
out_file = r"C:\USAData\Cities_Sample.json"

# Generate a random sample of features
arcpy.SelectRandomSample_Reviewer(in_layer_or_view, sampling_method, confidence_level, margin_of_error, out_file)

Environments

Licensing information

  • Basic: Requires Data Reviewer
  • Standard: Requires Data Reviewer
  • Advanced: Requires Data Reviewer

Related topics