Summary
Analyzes two variables for statistically significant relationships using local entropy. Each feature is classified into one of six categories based on the type of relationship. The output can be used to visualize areas where the variables are related and explore how their relationship changes across the study area.
Illustration
Usage
This tool accepts points and polygons as input and should be used with continuous variables. It is not appropriate for binary or categorical data.
It is recommended to store your Output Features in a geodatabase rather than as a shapefile (.shp). Shapefiles cannot store null values in attributes and cannot store charts in their pop-up dialogs.
Each input feature will be classified into one of the following relationship categories based on how reliably the Explanatory Variable parameter can predict the Dependent Variable parameter:
- Not Significant—The relationship between the variables is not statistically significant.
- Positive Linear—The dependent variable increases linearly as the explanatory variable increases.
- Negative Linear—The dependent variable decreases linearly as the explanatory variable increases.
- Concave—The dependent variable changes by a concave curve as the explanatory variable increases.
- Convex—The dependent variable changes by a convex curve as the explanatory variable increases.
- Undefined Complex—The variables are significantly related, but the type of relationship cannot be reliably described by any of the other categories.
Whether or not there is a relationship between two variables does not depend on which is labeled as the explanatory variable and which is labeled as the dependent variable. For example, if diabetes is related to obesity, obesity is similarly related to diabetes. However, the classification of the type of relationship may change depending on which variable is labeled as the explanatory variable and which is labeled as the dependent variable. It is possible for one variable to accurately predict a second variable, but the second variable cannot accurately predict the first. If you are unsure which variable should be labeled explanatory and dependent, run the tool twice and try both.
This tool supports parallel processing and uses 50 percent of available processors by default. The number of processors can be increased or decreased using the Parallel Processing Factor environment.
Syntax
LocalBivariateRelationships(in_features, dependent_variable, explanatory_variable, output_features, {number_of_neighbors}, {number_of_permutations}, {enable_local_scatterplot_popups}, {level_of_confidence}, {apply_false_discovery_rate_fdr_correction}, {scaling_factor})
Parameter | Explanation | Data Type |
in_features | The feature class containing fields representing the dependent_variable and explanatory_variable. | Feature Layer |
dependent_variable | The numeric field representing the values of the dependent variable. When categorizing the relationships, the explanatory_variable is used to predict the dependent_variable. | Field |
explanatory_variable | The numeric field representing the values of the explanatory variable. When categorizing the relationships, the explanatory_variable is used to predict the dependent_variable. | Field |
output_features | The output feature class containing all input features with fields representing the dependent_variable, explanatory_variable, entropy score, pseudo p-value, level of significance, type of categorized relationship, and diagnostics related to the categorization. | Feature Class |
number_of_neighbors (Optional) | The number of neighbors around each feature (including the feature) that will be used to test for a local relationship between the variables. The number of neighbors must be between 30 and 1000, and the default is 30. The provided value should be large enough to detect the relationship between features, but small enough to still identify local patterns. | Long |
number_of_permutations (Optional) | Specifies the number of permutations used to calculate the pseudo p-value for each feature. Choosing a number of permutations is a balance between precision in the pseudo p-value and increased processing time.
| Long |
enable_local_scatterplot_popups (Optional) | Specifies whether scatterplot pop-ups will be generated for each output feature. Each scatterplot displays the values of the explanatory (horizontal axis) and dependent (vertical axis) variables in the local neighborhood along with a fitted line or curve visualizing the form of the relationship. Scatterplot charts are not supported for shapefile outputs.
| Boolean |
level_of_confidence (Optional) | Specifies a confidence level of the hypothesis test for significant relationships.
| String |
apply_false_discovery_rate_fdr_correction (Optional) | Specifies whether False Discover Rate (FDR) correction will be applied to the pseudo p-values.
| Boolean |
scaling_factor (Optional) | Controls the sensitivity to subtle relationships between the variables. Larger values (closer to one) can detect relatively weak relationships, while smaller values (closer to zero) will only detect strong relationships. Smaller values are also more robust to outliers. The value must be between 0.01 and 1, and the default is 0.5. | Double |
Code sample
The following Python window script demonstrates how to use the LocalBivariateRelationships function.
import arcpy
arcpy.env.workspace = 'C:\\LBR\\MyData.gdb'
arcpy.LocalBivariateRelationships_stats('ObesityDiabetes', 'ObesityRate',
'DiabetesRate','LBR_Results', 30, '199', 'CREATE_POPUP',
'95%', 'APPLY_FDR', 0.5)
The following stand-alone Python script demonstrates how to use the LocalBivariateRelationships function.
# Use the Local Bivariate Relationships tool to study the relationship between
# obesity and diabetes.
# Import system modules.
import arcpy
import os
# Set property to overwrite existing output by default.
arcpy.env.overwriteOutput = True
try:
# Set the workspace and input features.
arcpy.env.workspace = r"C:\\LBR\\MyData.gdb"
inputFeatures = 'ObesityDiabetes'
# Set the output workspace and output name.
outws = 'C:\\LBR\\outputs.gdb'
outputName = 'LBR_Results'
# Set input features, dependent variable, and explanatory variable.
depVar = 'DiabetesRate'
explVar = 'ObesityRate'
# Set number of neighbors and permutations.
numNeighbors = 50
numPerms = '999'
# Choose to create popups.
popUps = 'CREATE_POPUP'
# Choose confidence level and apply False Discovery Rate correction.
confLevel = '95%'
fdr = 'APPLY_FDR'
# Set the scaling factor.
scaleFactor = 0.5
# Run Local Bivariate Regression.
arcpy.LocalBivariateRelationships_stats(inputFeatures, depVar, explVar,
os.path.join(outws, outputName),
numNeighbors, numPerms, popUps,
confLevel, fdr, scaleFactor)
except arcpy.ExecuteError:
# If an error occurred when running the tool, print out the error message.
print(arcpy.GetMessages())
Licensing information
- Basic: Yes
- Standard: Yes
- Advanced: Yes