The Build Balanced Zones tool uses a genetic algorithm to create spatially contiguous zones in your study area based on criteria that you specify. You can create zones that contain an equal number of features, zones that are similar based on a set of attribute values, or both. There are also options to select zones with approximately equal areas, that are as compact as possible, and that maintain consistent summary statistics (such as averages and proportions) of other variables.
Example scenarios
This tool can be used in the following types of scenarios:
- A retail company wants to create districts in which each manager is responsible for an equal sales volume and number employees regardless of the number of stores in each district.
- Climate change is contributing to an increase in the number of wildfires per year in many areas, which increases costs in those regions. Local and national governments can use this tool to equalize both the workload and costs of fighting fires by creating administrative districts.
- Police patrol districts can be created to balance the workload and calls among officers. Overstaffing or understaffing in some areas can be relieved by balancing the crime index of each block group and ensuring the timeliness and effectiveness of police response.
Defining criteria for zone building and selection
For the tool to build optimally balanced zones, you must provide the criteria that defines what it means for a zone to be optimal. There are two types of criteria that you can provide: zone building criteria and zone selection criteria. The tool balances zones by alternating between zone building and zone selection, and the criteria that you specify for each step determines the final balanced zones recommended by the tool.
The zone creation step builds many zones that randomly grow until the zone building criteria are met. The criteria for zone building should be considered requirements for your zones, and all zones will grow in such a way to honor these criteria (more detail on how zones grow is provided in the Grow zones with the Genetic Algorithm section). The zone selection step then ranks each set of zones and selects the best zones based on how closely they honor the zone selection criteria. In general, the zone selection criteria should be considered preferences rather than requirements, and they are used to select from the zones that already meet the zone building criteria.
Choosing the criteria to use for zone building and zone selection depends on your particular circumstance, and the following options for the Zone Creation Method parameter are provided to help you define your criteria correctly:
- Attribute target—Each zone will have approximately the same total sum of an attribute, and this total sum must be specified. The number of zones that are created depends on the overall total of the attribute. For example, this option can be used to create service areas that all contain approximately 1,000 customers. If there are 5,000 total customers among all the input features, approximately 5 zones will be created, each with approximately 1,000 customers. With two million total customers, approximately 2,000 zones will be created, each containing approximately 1,000 customers.
- Defined number of zones—The number of zones must be equal to a specified number, and each zone is composed of approximately the same number of features. This option is useful when you know how many zones you need, and you need each zone to contain the same number of input features.
- Number of zones and attribute target—Combines the two previous options by balancing the sum of an attribute among a particular number of zones. For example, this option can be used to create exactly 20 service areas that all have approximately the same sales volume within each zone. For this option, you do not provide the desired attribute sum because it is determined by dividing the total sum of the attribute by the number of zones. This option does not equalize the number of features in each zone (however, preference for an equal number of features can be provided as zone selection criteria, as described later in this topic).
Zone building criteria
There are different required zone building criteria for each of the Zone Creation Method parameter options, and all zones grow until they satisfy these criteria.
- Attribute target—You must provide the attribute that you want to balance, and provide a sum in the Zone Building Criteria With Target parameter. Optionally, you can provide multiple attributes with different sums, and you can provide weights to each of the attributes so that some attributes are prioritized over others.
- Defined number of zones—You must provide the number of zones using the Target number of Zones parameter.
- Number of zones and attribute target—You must provide the number of zones in the Number of Zones parameter and provide the attribute to balance in the Zone Building Criteria parameter. You can again provide multiple attributes and provide weights to prioritize them.
Zone selection criteria
The zone selection criteria are additional options that allow you to specify your preferences among the zones created in the zone building step. These criteria are used with the zone building criteria to determine which zones will ultimately be recommended by the tool. Most zone selection criteria are applicable to all options for the Zone Creation Method parameter. The sections below describe each of the zone selection options and when they are applicable.
Zone characteristics criteria
The Zone Characteristics parameter options generally relate to the size and shape of the zones. You can use any or all of the following options:
- Equal area—Preference is given to zones that are close to equal in area. This option only applies when the input features are polygons.
- Compactness—Preference is given to zones that are close to circular in shape. This option is always applicable.
- Equal number of features—Preference is given to zones that are composed of approximately the same number of features. This option does not apply when the Defined number of zones option is used because that zone creation method already ensures an equal number of features with zone building criteria.
Attribute consideration criteria
The Attribute to Consider parameter allows you to specify additional attributes to take into consideration that were not used as zone building criteria. This allows you to include zones that maintain a consistent sum, average, variance, or median of an attribute. You can also provide multiple attributes, so, for example, you can include zones that have the same total population (sum) and that have approximately the same median income (median).
This selection criteria applies to all zone creation methods. The attributes must be continuous rather than categorical to be used in this parameter.
Maintaining proportions of categorical variables
The Categorical Variable to Maintain Proportions parameter allows you to specify a categorical variable that will be used to balance the proportions of the categories in the zones. You must also choose a Proportion Method to specify how the proportions should be balanced. The following proportion methods are provided:
- Maintain within proportion—Preference is given to zones that maintain the same relative proportions of the categories within each individual zone. For example, if the categorical variable you are using to maintain proportions represents a binary land cover classification of forest and nonforest, and 60 percent of the features have a land cover type of forest and 40 percent have a land cover type of nonforest, this option will favor zones where each individual zone is comprised of 60 percent forest and 40 percent nonforest.
- Maintain overall proportion—Zones will be created so that the overall proportions of category predominance matches the proportions of the overall categories. For example, if the categorical variable represents whether the feature is on land or water, and 60 percent of features are on land, this option will favor zones where approximately 60 percent of the zones are predominantly on land and 40 percent of the zones are predominantly on water.
This selection criteria applies to all zone creation methods. The variable must be categorical rather than continuous to be used in this parameter.
Distance-based criteria
The Distance to Consider parameter allows you favor zones that are close to a different set of features (or multiple sets of features). For example, if you are building health administration districts, you can favor zones that are close to existing hospitals. The distance between a zone and a feature is defined by the median distance from all features in the zone to the closest feature provided as a distance to consider.
Scenarios for choosing zone criteria
There are many available options for zone building and zone selection criteria, and it may not be immediately obvious which parameters to use based on your requirements and preferences for zones. Below are a few scenarios along with the parameters that should be used for these scenarios.
Scenario 1: Assigning property listings to Realtors
You are an analyst in a real estate agency. You have a feature class with all the available property listings and their costs in the area. There are 12 Realtors in your agency, and you want to assign the property listings to each of them such that every Realtor gets an equal number of listings and the total cost of the properties in each zone is approximately the same. You also want the zones to be close to existing real estate branches.
To achieve this, you can use the Number of zones and attribute target option in the Zone Creation Method parameter. Specify 12 for the Target Number of Zones parameter (one for each Realtor) and choose the field representing the cost of each property in the Zone Building Criteria parameter. Choose the Equal number of features option in the Zone Characteristics parameter to prefer zones where each Realtor is assigned an approximately equal number of properties. To take into account the distance to the closest real estate branch, provide a feature class representing the locations of the branches in the Distance to Consider parameter.
Scenario 2: Creating new district boundaries
Creating new district boundaries that balance the number of people within each district is a complicated and difficult task that must be frequently performed at nearly every level of government. Using population and demographic data gathered in small neighborhoods, you want to create districts that each have approximately 10,000 people. Additionally, approximately 75 percent of the population live in urban areas and 25 percent live in rural areas. You want approximately 75 percent of the zones to be majority urban and 25 percent to be majority rural to properly represent each group.
To achieve this, you can use the Attribute target option in the Zone Creation Method parameter. Choose the field representing the number of people in each neighborhood and specify 10,000 in the columns of the Zone Building Criteria With Target parameter. Specify a field representing whether the neighborhood is urban or rural in the Categorical Variable to Maintain Proportions parameter, and choose the Maintain overall proportion option in the Proportion Method parameter.
Scenario 3: Allocating workload to parole officers
You are a GIS analyst for the law enforcement division and have been tasked to balance the caseloads of 25 parole officers. You have the locations of all the offenders in the city along with a numerical risk assessment of each offender, and you want to divide them evenly between each of the parole officers while keeping the total risk of the offenders approximately the same. However, the balance of risk is not as essential as having the same number of offenders assigned to each parole officer. Additionally, you want the zones as compact as possible to reduce travel cost for the parole officers.
To achieve this, you can use the Defined number of zones option in the Zone Creation Method parameter and specify 12 for the Target Number of Zones parameter. Specify the field representing the risk of each offender in the Attribute to Consider parameter, and choose the Compactness option in the Zone Characteristics parameter to create compact zones.
Grow zones with the Genetic Algorithm
Using the criteria that you defined for zone building and zone selection, the Build Balanced Zones tool grows optimal zones using the Genetic Algorithm (GA), given the spatial constraints of the input features.
GA is based on the evolutionary theory of natural selection and genetics, as first explained by Charles Darwin. According to the Darwinian principal of survival of the fittest, the organisms that are more fit in a population tend to survive and produce more offspring.
Because the number of possible solutions is usually very large, GA searches for an optimized solution by starting with random searches and driving the searches in more promising directions. GA optimization is an abstraction of natural biological evolution in which each possible solution (in this case, each possible arrangement of zones) is analogous to an individual organism in a population. As the generations advance, only the fittest individuals continue to survive, and only the most productive searches are allowed to continue.
The algorithm starts by creating a random population in which each individual in the population is a possible solution. A fitness score is computed for each solution, the individuals with the lowest fitness score (best solutions) in the existing population are passed on to the next generation, and the remaining unfit solutions are eliminated. The fittest individuals are designated as parents and are allowed to create offspring in pairs using genetic operators such as crossover and mutation. Each new generation is a combination of the fittest individuals from the previous generation and their offspring. Sometimes, individuals (neither parents nor offspring, called aliens) are randomly introduced in the next generation to broaden the range of possible solutions discovered by the algorithm. The fitness score is calculated for all individuals in each new generation, and the process repeats for a given number of generations (50 generations, by default). The fittest individual of the final generation corresponds to the balanced zones that are returned by the tool.
Choosing the initial population
The initial population for the algorithm is chosen by randomly selecting features in the study area. By default, the population will be composed of 100 individuals, but the number can be changed in the Population Size parameter. Each randomly selected feature is a starting location (called a seed) from which the zone grows by aggregating into nearby features. A zone will continue to aggregate and grow until the total value reaches the zone building criteria threshold. For example, if you provide a population of 100,000 and number of households of 50,000 as zone building criteria, values 100,000 and 50,000 are threshold values of population and households, respectively, and the zone stops growing once all thresholds are reached. For the next zone, a new seed is selected outside of the first zone and is allowed to randomly grow. The process continues until all the features are assigned to a zone.
In biology, the genetic material of an individual is contained in chromosomes. Chromosomes are composed of genes, the hereditary unit of life. For this tool, a possible solution is analogous to a chromosome, and seeds are analogous to the genes. These genes are the hereditary units transferred from parents to their offspring and thus are used to create future generations. The following image shows an example of a possible solution along with the Object ID values of the seeds of each zone in the solution:
Calculating the fitness score
The fitness score of each possible solution is a measure of how closely the resulting zones honor the various zone building and zone selection criteria provided in the tool, and lower fitness scores indicate better fits to the criteria. Thus, the final goal is to find a solution that provides a low fitness score (ideally, it will be the solution that provides the lowest possible fitness score, which is called the global minimum). The fitness score for a possible solution is calculated using the following formula:
- n—The total number of zones in the solution
- c—The total number of criteria use to build zones and select solutions.
- Vj—The threshold value for the jth criteria.
- Vij—The sum of the jth criteria of the ith zone.
The formula for the fitness score should be understood as adding up how much the values of each criteria differ from ideally perfect values for each zone in the solution. In a perfect solution (which is generally not possible), the criteria values of all zones equal their ideal values, and the fitness score is zero. Dividing by the threshold value (Vj) ensures that the fitness score is unitless. Being unitless is important because, for example, using square meters or square feet for area measurements will not affect the fitness score.
The Output Convergence Table parameter provides the total fitness score of the best solution in each generation along with the scores of each individual criterion. A convergence chart is created with the output table to show how these fitness scores change over the generations. The algorithm tries to find a better solution than the existing one, so the total fitness line in the chart generally decreases as the generations progress, until it eventually flattens out. This flattening indicates that the fitness score has attained a local minimum (though likely not the global minimum), and continuing with further generations will likely not improve the solution.
Creating new generations with crossover
Once the initial population is generated by the algorithm, half of the individuals are allowed to participate in the next generation and create offspring of new possible solutions. These individuals are chosen based on their fitness (determined by the lowest fitness scores) and are randomly paired to produce offspring using a process called crossover. Crossover (sometimes called recombination) is a genetic operator that combines the information from two parents to generate offspring. The following image shows an example of the seeds of two parents being crossed over to create two new offspring:
As a result, half of the individuals of the new generations were parents in the previous generation, and half are their offspring. For this new generation, again the fitness score is calculated, and the top half advance to the next generation, driving the search toward a better solution.
Expanding possible solutions with mutations and alien species
Diversity is very important in an evolving population, and one way diversity is maintained is through mutation. Mutations are small changes or alterations introduced in a sequence of genes to create individuals with different genetic codes. In this tool, any individual that undergoes a mutation will have its seeds randomly rearranged (permuted). The following images shows the one possible solution going through a mutation and having its seeds rearranged:
The probability that an offspring will undergo a mutation can be controlled with the Mutation Factor parameter, and the default value is 0.1. Introducing mutations expands the possible solutions and often allows the algorithm to quickly converge to a locally optimal solution. However, very high mutation factors will introduce so many changes that the algorithm will lose efficiency and converge slowly (or not at all).
Another way to expand the possible solutions is to introduce new individuals to the population (called aliens) that are neither a part of previous generations nor their offspring. The introduction of alien individuals increases the probability of obtaining a global minimum (rather than a local minimum) while maintaining a high rate of convergence. The mutation factor controls the proportion of offspring of each generation that will be replaced by an alien composed of randomly generated seeds.
Disconnected groups
Due to spatial constraints, sometimes you can have disconnected groups where the features in the group aren't neighbors of any features in the larger study area. This is most common when the Input Features are polygons that are not contiguous, such as islands. Zones can only grow by aggregating spatial neighbors of the existing features in the zone. To resolve this, the tool generates a link between each disconnected group and the closest feature outside the group to establish a neighborhood and allow the zone to continue growing. The Disconnected Group ID field is added to the attribute table of the Output Features that allows you to visualize which feature or group of features were disconnected in your study area.
Additional resources
- Coley, D. A. (1999). An introduction to genetic algorithms for scientists and engineers. World Scientific Publishing Company.
- Lorena, L. A. N., & Furtado, J. C. (2001). Constructive genetic algorithm for clustering problems. Evolutionary Computation, 9(3), 309-327.
- Patel, N., & Padhiyar, N. (2010, October). Alien Genetic Algorithm for Exploration of Search Space. AIP Conference Proceedings (Vol. 1298, No. 1, pp. 325-330). AIP.