Aggregate values into related features

Typically, features are symbolized in a map based on their attribute values. Some examples include cities symbolized by population or roads symbolized by road classifications. Sometimes it is necessary to symbolize features by computed aggregated values instead. Examples of computed values include minimums, maximums, averages, or totals. They are values that are computed from the attributes of an aggregated set of values.

Temporal data scenarios

There are two main scenarios where symbolizing data with aggregated values is useful.

Station data with a time series

This is data of changing quantity that is fixed in location over time. Features are observed, collected, or forecast at these locations over time at regular or irregular intervals. Examples include temperature, rainfall, water level, salinity, and discharge measured at weather stations or at monitoring stations along streams.

You may want to symbolize weather station features by total July rainfall, or by highest or lowest annual temperature. To visualize the data in this way, model the data so that the fixed locations are stored in a feature class and the observed data is stored in a related table, along with date-time attribution.

Sparse data with time stamps

These are features that represent events that occurred at a single point in time. Examples include the locations of tornadoes, customer addresses, patients admitted at hospitals, and crime locations.

You may want to show total tornado loss, or total purchases, or total number of patients aggregated to polygon features such as provinces, districts, or other administrative boundaries. To visualize the data in this way, model the data so that the locations are stored in a feature class with date-time attribution.

Prepare data for aggregated display

To display features with aggregated results, consider the following:

  • Aggregation type—Determine whether to show totals, minimums, maximums, averages, or some other metric.
  • Display features—Determine which features display the aggregated results. These can be the same or different than the features with measured observations. For example, you can display total annual rainfall at each weather station, or you can display the total annual rainfall (as measured at individual weather station point features) based on county or watershed boundaries.
  • Query parameters:
    • Time—Determine the time extent and time interval in which to display aggregated results. Use the time slider to set these dynamically. For example, for daily aggregated results, set both the time extent and the time interval on the time slider to one day.
    • Range—You may want to aggregate results for only a subset of features. For example, to aggregate salinity measures between 0 and 50 meters and between 50 and 75 meters, use a range parameter and the range slider.
    • Query filters—To see patterns in crime data by aggregating the type of crime in each precinct, use a discrete parameter to establish a query filter.
    Query parameters are optional in this workflow. See Define parameters in a query layer to learn how to add range parameters and add filters using discrete parameters.

Aggregated results example

Consider a feature class, named weather_stations that stores the locations of weather stations. Each station feature is uniquely identified by the value stored in the station_id field. This field is used to link the stations to related observation records in a table named observed_rainfall. This table holds records of observed rainfall collected over time at the same location. It includes a station_id field to link to the feature class of stations, a rainfall_cm field that stores daily rainfall in centimeters, and a collection_date field that stores the date of rainfall events.

Compute aggregated results dynamically with query layer parameters

The goal is to compute total rainfall for any given period (for example, a month, a year, or any custom time extent) at each station, and symbolize the stations by total rainfall computed for that time extent. There are two approaches to accomplish this.

The first approach is to compute the aggregated results dynamically, using the simple or advanced methods below, by defining query layer parameters. The second approach involves precomputing the results by using geoprocessing tools or database functions.

Run the Make Aggregation Query Layer geoprocessing tool

You can run the Make Aggregation Query Layer tool to build a query layer and define query parameters to display your results. This is the more flexible approach of the two and ensures the most current results are always displayed. This tool uses database SQL aggregation functions and join capabilities to summarize, aggregate, and filter information.

The Make Aggregation Query Layer tool allows you to enable time on the query layer to dynamically view different time periods instead of needing to predefine and calculate a result for each desired time range.

  1. With the map open, on the Analysis tab, in the Geoprocessing group, click Tools Tools.
  2. In the Geoprocessing pane, on the Toolboxes tab, click Data Management tools > Layers and Table Views > Make Aggregation Query Layer. Click the tool to open it.
  3. Following the example scenario, set the tool's properties as follows:
    • On the Parameters tab, set the Target Feature Class parameter to the weather_stations feature class.
    • Set observed_rainfall as the Summary Table parameter.
    • Set the target and related join fields to station_id, and enter an output layer name.
    • Choose SUM as the summary field aggregate type. Under Field Name(s), choose rainfall_cm.
    • Under Parameter Definitions, choose Range as the parameter type, and specify a name, for example, timeWindow. This setting enables time on the query layer.
    • Define the Field or Expression parameter as the collection_date field.
    • Specify the Data Type parameter as Date.
    • Review and set any other parameters as needed. Based on the above choices, some are set automatically.
  4. Run the tool. The time-aware query layer is added to the map and the Contents pane.
  5. Open the query layer's attribute table. An additional field named SUM_rainfall_cm shows the total rainfall using all records.

You can enable the time slider and navigate to any time extent. As you navigate, the values are recomputed for the map's current time extent. You can also optionally use attributes to symbolize and label weather station point features. To learn more about this tool, see Make Aggregation Query Layer.

Build a query layer manually

In some cases, you may need to build the query layer manually if the query uses advanced keywords or parameters. Some examples includethe following:

  • Predefined discrete parameters, such as view_scale and view_extent
  • SQL queries with more than one join, or an outer join

There are disadvantages to this approach. They include the following:

  • There may be an additional load on the database, since the aggregated results are calculated every time you navigate the map, use the time slider, or open the attribute table. This additional load is negligible for smaller datasets or when the time window returns relatively smaller subsets, even with large datasets.
  • This approach requires that the data be stored in an enterprise database. This approach cannot be used with file geodatabases or shapefiles.

If your data is updated frequently, consider moving the data to an enterprise database and accepting the additional load on the database. Ensure that best practices for database maintenance, such as creating indexes, are followed.

Follow these steps to build a query layer manually:

  1. On the Map tab, in the Layer group, open the Add Data menu and click Query Layer.
  2. In the Query box, construct an SQL query to return the aggregated result.

    Select the total rainfall for each station from the observed_rainfall table.

    SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
    FROM     observed_rainfall
    GROUP BY station_id

    Join the results to the weather_stations feature class. This returns total rainfall using all records from the observed_rainfall table.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             GROUP BY station_id) AS r 
    ON w.station_id = r. station_id

    Set the time extent in the WHERE clause to compute aggregated results between June 1, 2016, and June 30, 2016.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             WHERE    collection_date >= '6/1/2016' AND collection_date <= '6/30/2016'
             GROUP BY station_id) AS r 
    ON w.station_id = r.station_id
  3. Click Next to complete the remaining instructions on the New Query Layer dialog box and add the query layer to the current map.
  4. Symbolize the layer to display weather stations with total rainfall at each location for June 2016. For example, you can size features according to the total rainfall using graduated or proportional symbols, or you can use variations in color for each station instead, using graduated or unclassed colored symbols.

To change the time extent, for example, to see the same visualization but from August 1 to August 15 instead, you must open the source of the query layer and change the WHERE clause. But using the time slider to make this change is preferable. To use the time slider to modify the WHERE clause, you must use parameters in the SQL statement.

Follow these steps to change the time extent:

  1. Right-click the layer in the Contents pane and click Properties Properties.
  2. On the Source tab, in the Query row of the table, click Edit Query Edit Metadata. The query is displayed in the Query text box.
    Tip:

    To make the query text easier to read, you can enlarge it by pressing the Ctrl key while rotating the mouse wheel forward.

  3. Replace WHERE collection_date >= '6/1/2016' AND collection_date <= '6/30/2016' with a range parameter. Range parameters are declared with the ::r: prefix.

    The WHERE clause is replaced by a range parameter named TimeVar.

    SELECT  w.*, r.total_rainfall
    FROM    weather_stations AS w 
    INNER JOIN
            (SELECT   station_id, SUM(rainfall_cm) AS total_rainfall
             FROM     observed_rainfall
             WHERE    ::r:TimeVar
             GROUP BY station_id) AS r 
    ON w.station_id = r.station_id
  4. Parameters appear in the SQL query in bold and include an edit button Edit Metadata next to them. Click the button to set properties of the parameter as follows:
    • For Field or Expression, choose collection_date.
    • For Data Type, choose Date.
    • Check Default values.
    • Set the default value range to 6/1/2016 to 6/30/2016.
  5. Click Done.
  6. On the Edit Query dialog box, click Validate to ensure there are no errors.

The layer is now time aware because the range parameter references a date-time field. Time-aware layers can be controlled with the time slider. For example, use the time slider to choose a time extent, and the weather stations are displayed with the dynamically computed total rainfall for that time extent.

  • For a monthly total, set both Current Time Span and Step Interval to one month.
  • For a rolling average, set Current Time Span to one month and Step Interval to one day. In this case, every time you click through an interval on the time slider, it moves to the next day and shows results for one month from that day.

Tip:

In the attribute table, on the View tab, enable both the Time and Extent filters to show only features that are visible on the map. Otherwise, the results are computed for all features and from the entire table, which may be slow if the table contains many records.

Precompute the results

This approach is ideal when you work with data that is updated infrequently and a predefined time extent is acceptable. In this case, precompute the aggregated results to diminish the load on the database every time the map is refreshed, an attribute table is opened, or the layer is queried.

The disadvantages to this approach include the following:

  • You need to perform these steps each time data is updated.
  • It is not flexible. To visualize features with weekly and yearly results, for example, you need two separate tables: one for storing aggregated results for one week and the other for storing the yearly results.
  • You can't use the time slider with this approach.

There are two ways to precompute aggregated results.

Use geoprocessing

The following example outlines steps to precompute the monthly average rainfall for each weather station using geoprocessing tools.

  1. Add a new integer field in the feature class named month to represent the month.
  2. Use the Calculate Field tool to update records.

    Set the Expression parameter to month = getMonth(!DateTime!).

    Enter the following into the Codeblock parameter. In this example, v is the value of the datetime attribute.

    from datetime import datetime
    def getMonth(v):
        d = datetime.strptime(v, "%m/%d/%Y")
    return d.month
  3. Use the Summary Statistics tool to create a table with summary results. In this example, the summary is the monthly total rainfall for each station. Enter the parameters as follows:
    • Statistics Field: Choose the field that you want to summarize. In this example, it is the rainfall field.
    • Statistic Type: Choose SUM to compute the total rainfall.
    • Case Field: Choose the station_id, and month fields.
  4. Use the Join Field tool or the Add Join tool to join records from the summary table to the feature layer.
  5. Display the layer in the map with a definition query to show results for a particular month. For example, to show results from May only, use a definition query where Month = 5.

Use database functions

If your data is stored in an enterprise database, you can use an SQL query to precompute aggregated or summary results and save them in a separate table.

  1. Execute the following SQL query in a database client. It computes the monthly average for each weather station. The query assumes that there is a field storing integers to represent a month.
    SELECT   station_id, month, SUM(rainfall_cm) AS total_rainfall
    INTO     avg_monthly_rainfall
    FROM     observed_rainfall
    GROUP BY station_id, month
  2. If the database is a geodatabase, register the table to the geodatabase for optimal performance and join the table to the feature layer.

    Click the feature layer in the Contents pane. Under the Feature Layer tab, on the Data tab, in the Relationship group, click Joins and click Add Join.

  3. If it is not a geodatabase, you must choose one of the following strategies instead:
    • Create a view in the database that joins the table that holds geometries with the newly create summary table.
    • Create a query layer and define the join in the source of the query layer.
  4. Display the layer in the map with a definition query to show results for a particular month. For example, to show results from May only, use a definition query where Month = 5.
Tip:

You can use Windows scheduler to schedule a task to run at a regular interval to precompute results.

Related topics