Aggregate values into related features

Typically, features are symbolized in a map based on their attribute values. Some examples include cities symbolized by population or roads symbolized by road classifications. Sometimes it is necessary to symbolize features by computed aggregated values instead. Examples of computed values include minimums, maximums, averages, or totals. They are values that are computed from the attributes of an aggregated set of values.

Temporal data scenarios

There are two main scenarios where symbolizing data with aggregated values is useful.

Station data with a time series

This is data of changing quantity that is fixed in location over time. Features are observed, collected, or forecast at these locations over time at regular or irregular intervals. Examples include temperature, rainfall, water level, salinity, and discharge measured at weather stations or at monitoring stations along streams.

You may want to symbolize weather station features by total July rainfall, or by highest or lowest annual temperature. To visualize the data in this way, model the data so that the fixed locations are stored in a feature class and the observed data is stored in a related table, along with date-time attribution.

Sparse data with time stamps

These are features that represent events that occurred at a single point in time. Examples include the locations of tornadoes, customer addresses, patients admitted at hospitals, and crime locations.

You may want to show total tornado loss, or total purchases, or total number of patients aggregated to polygon features such as provinces, districts, or other administrative boundaries. To visualize the data in this way, model the data so that the locations are stored in a feature class with date-time attribution.

Prepare data for aggregated display

To display features with aggregated results, consider the following:

  • Aggregation type—Determine whether to show totals, minimums, maximums, averages, or some other metric.
  • Display features—Determine which features display the aggregated results. These can be the same or different than the features with measured observations. For example, you can display total annual rainfall at each weather station, or you can display the total annual rainfall (as measured at individual weather station point features) based on county or watershed boundaries.
  • Query parameters:
    • Time—Determine the time extent and time interval in which to display aggregated results. Use the time slider to set these dynamically. For example, for daily aggregated results, set both the time extent and the time interval on the time slider to one day.
    • Range—You may want to aggregate results for only a subset of features. For example, to aggregate salinity measures between 0 and 50 meters and between 50 and 75 meters, use a range parameter and the range slider.
    • Query filters—To see patterns in crime data by aggregating the type of crime in each precinct, use a discrete parameter to establish a query filter.
    Query parameters are optional in this workflow. See Define parameters in a query layer to learn how to add range parameters and add filters using discrete parameters.

Aggregated results example

Consider a feature class, named weather_stations that stores the locations of weather stations. Each station feature is uniquely identified by the value stored in the station_id field. This field is used to link the stations to related observation records in a table named observed_rainfall. This table holds records of observed rainfall collected over time at the same location. It includes a station_id field to link to the feature class of stations, a rainfall_cm field that stores daily rainfall in centimeters, and a collection_date field that stores the date of rainfall events. The goal is to compute total rainfall for any given period (for example, a month, a year, or any custom time extent) at each station, and symbolize the stations by total rainfall computed for that time extent. There are two approaches to accomplish this.

Compute aggregated results dynamically with query layer parameters

This is the more flexible approach of the two and ensures that the most current results are always displayed. You use database SQL aggregation functions and join capabilities. In ArcGIS Pro, you must create a query layer to do this.

There are disadvantages to this approach, however. They include the following:

  • Additional load on the database since the aggregated results are calculated every time you navigate the map, use the time slider, or open the attribute table.
  • It requires that the data is stored in an enterprise database. This approach cannot be used with file geodatabases or shapefiles.

If your data is updated frequently, consider moving the data to an enterprise database and accepting the additional load on the database.

  1. On the Map tab, in the Layer group, open the Add Data menu and click Query Layer.
  2. In the Query box, construct an SQL query to return the aggregated result.

    Select the total rainfall for each station from the observed_rainfall table.

    SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
    FROM     observed_rainfall
    GROUP BY station_id

    Join the results to the weather_stations feature class. This returns total rainfall using all records from the observed_rainfall table.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             GROUP BY station_id) AS r 
    ON w.station_id = r. station_id

    Set the time extent in the where clause to compute aggregated results between June 1, 2016 and June 30, 2016.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             WHERE    collection_date >= '6/1/2016' AND collection_date <= '6/30/2016'
             GROUP BY station_id) AS r 
    ON w.station_id = r.station_id
  3. Click Next to compete the remaining instructions in the New Query Layer dialog box and add the query layer to the current map.
  4. Symbolize the layer to display weather stations with total rainfall at each location for June 2016. For example, you may size features according to the total rainfall using graduated or proportional symbols, or you may use variations in color for each station instead, using graduated or unclassed colored symbols.

If you want to change the time extent, for example, to see the same visualization but from August 1 to August 15 instead, you need to open the source of the query layer and change the where clause. This is cumbersome, so using the time slider to make this change is preferable. To use the time slider to modify the where clause, you need to use parameters in the SQL statement.

  1. Right-click the layer in the Contents pane and click Properties Properties.
  2. On the Source tab, in the Query row of the table, click Edit Query Edit Metadata. The query is displayed in the Query text box.
    Tip:

    To make the query text easier to read, you can enlarge it by pressing the Ctrl key while spinning the mouse wheel forward.

  3. Replace WHERE collection_date >= '6/1/2016' AND collection_date <= '6/30/2016' with a range parameter. Range parameters are declared with the ::r: prefix.

    The where clause is replaced by a range parameter named TimeVar.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             WHERE    ::r:TimeVar
             GROUP BY station_id) AS r 
    ON w.station_id = r.station_id
  4. Parameters appear in the SQL query in bold and include an icon Edit Metadata next to them. Click the icon to set properties of the parameter as follows:
    • Field or Expression: collection_date
    • Data Type: Date
    • Default values: Checked
    • Default value range: 6/1/2016 to 6/30/2016
  5. Click Done.
  6. On the Edit Query dialog box, click Validate to ensure there are no errors.

The layer is now time aware because the range parameter references a date-time field. Time-aware layers can be controlled with the time slider. For example, use the time slider to choose a time extent, and the weather stations are displayed with the dynamically computed total rainfall for that time extent.

  • For monthly total, set both Current Time Span and Step Interval to one month.
  • For a rolling average, set Current Time Span to one month and Step Interval to one day. In this case, every time you click through an interval on the time slider, it moves to the next day and shows results for one month from that day.

Tip:

In the attribute table, on the View tab, enable both the Time and Extent filters to show only features that are visible on the map. Otherwise, the results are computed for all features and from the entire table, which may be slow if the table contains many records.

Precompute the results

This approach is ideal when you work with data that is updated infrequently and a predefined time extent is acceptable. In this case, precompute the aggregated results to diminish the load on the database every time the map is refreshed, an attribute table is opened, or the layer is queried.

The disadvantages to this approach include the following:

  • You need to perform these steps each time data is updated.
  • It is not flexible. To visualize features with weekly and yearly results, for example, you need two separate tables: one for storing aggregated results for one week and the other for storing the yearly results.
  • You can't use the time slider with this approach.

There are two ways to precompute aggregated results.

Use geoprocessing

The following example outlines steps to precompute the monthly average rainfall for each weather station using geoprocessing tools.

  1. Add a new integer field in the feature class named month to represent the month.
  2. Use the Calculate Field tool to update records.

    Set the Expression parameter to month = getMonth(!DateTime!).

    Enter the following into the Codeblock parameter. In this example, v is the value of the datetime attribute.

    from datetime import datetime
    def getMonth(v):
        d = datetime.strptime(v, "%m/%d/%Y")
    return d.month
  3. Use the Summary Statistics tool to create a table with summary results. In this example, the summary is the monthly total rainfall for each station. Enter the parameters as follows:
    • Statistics Field: Choose the field that you want to summarize. In this example, it is the rainfall field.
    • Statistic Type: Choose SUM to compute the total rainfall.
    • Case Field: Choose the station_id, and month fields.
  4. Use the Join Field tool or the Add Join tool to join records from the summary table to the feature layer.
  5. Display the layer in the map with a definition query to show results for a particular month. For example, to show results from May only, use a definition query where Month = 5.

Use database functions

If your data is stored in an enterprise database, you can use an SQL query to precompute aggregated or summary results and save them in a separate table.

  1. Execute the following SQL query in a database client. It computes the monthly average for each weather station. The query assumes that there is a field storing integers to represent a month.
    SELECT   station_id, month, SUM(rainfall_cm) AS total_rainfall
    INTO     avg_monthly_rainfall
    FROM     observed_rainfall
    GROUP BY station_id, month
  2. If the database is a geodatabase, register the table to the geodatabase for optimal performance and join the table to the feature layer.

    Select the feature layer in the Contents pane. Under the Feature Layer tab, on the Data tab, in the Relationship group, click Joins and click Add Join.

  3. If it is not a geodatabase, you must choose one of the following strategies instead:
    • Create a view in the database that joins the table that holds geometries with the newly create summary table.
    • Create a query layer and define the join in the source of the query layer.
  4. Display the layer in the map with a definition query to show results for a particular month. For example, to show results from May only, use a definition query where Month = 5.
Tip:

You can use Windows scheduler to schedule a task to run at a regular interval to precompute results.

Related topics