Creating a space-time cube allows you to visualize and analyze your spatiotemporal data, in the form of time-series analysis, integrated spatial and temporal pattern analysis, and powerful 2D and 3D visualization techniques. There are two tools that create a space-time cube for analysis: Create Space Time Cube By Aggregating Points or Create Space time Cube From Defined Locations. Both tools take timestamped features and structure them into a netCDF data cube by generating space-time bins with either aggregated incident points or defined features with associated spatiotemporal attributes.
If you have timestamped point features that you want to aggregate spatially to understand spatiotemporal patterns at locations throughout your study area, you will use the Create Space Time Cube By Aggregating Points tool. This will result in either a grid cube (fishnet or hexagon) or a cube structured by the defined locations you provide as aggregation polygons. Within each bin of the cube, the points are counted, any Summary Field statistics are calculated, and the trend for bin values across time at each location is measured using the Mann-Kendall statistic. When you choose to aggregate using a fishnet or hexagon grid, you will create a grid cube. When you choose to aggregate using a set of defined locations as aggregation polygons, you will create a defined locations cube. Creating a space time cube by aggregating points is most common when the point data represents incidents, such as crimes or customer sales, and you want to aggregate those incidents into either a grid or a set of polygons representing police beats or sales territories.
If you have feature locations that do not change over time and attributes or measurements that have been collected over time, such as panel data or station data, you will use the Create Space time Cube From Defined Locations tool. This will result in a cube that is structured using those defined locations, with either one set of attributes per time period (if no temporal aggregation is chosen) or summary statistics at each time period for the attributes chosen (if temporal aggregation is chosen). Within each bin of the defined locations cube, the count of observations for that bin at that time period and any Variables or Summary Field statistics are calculated, and the trend for bin values across time at each location is measured using the Mann-Kendall statistic.
Setting the structure of the cube
In most cases, you will know how to define the cube bin dimensions; it is recommended that you consider what the appropriate dimensions should be for the particular questions you are trying to answer. If you are looking at crime events, for example, you may decide to aggregate points into 400-meter or 0.25-mile bins because that is your city block size. If you have data covering an entire year, you may decide to look at trends in terms of monthly or weekly event aggregation.
The cube structure will have rows, columns, and time steps. If you multiply the number of rows by the number of columns by the number of time steps, you will obtain the total number of bins in the cube. The rows and columns determine the spatial extent of the cube, while the time steps determine the temporal extent.
Defined locations cube
The cube structure will have features and time steps. If you multiply the number of features by the number of time steps, you will obtain the total number of bins in the cube. The features determine the spatial extent of the cube, while the time steps determine the temporal extent.
Spatial defaults for the grid cube
In the case where you do not have strong justification for any particular grid size for your grid cube, you can leave the Distance Interval parameter blank and let the tool calculate default values for you.
The default bin distance is calculated by first determining the distance of the longest side of the Input Features extent (maximum extent). The bin distance set is then the larger of either the maximum extent divided by 100 or an algorithm based on the spatial distribution of the Input Features.
Spatial structure for the defined locations cube
The spatial structure of the defined locations cube are simply the locations provided.
Temporal defaults for the grid cube
In the case where you do not have strong justification for any particular time-step interval, you can leave the Time Step Interval parameter blank and let the tool calculate default values for you. The default time step interval is based on two different algorithms used to determine the optimal number and width of time-step intervals. The minimum numeric result from these algorithms, larger than ten, is used for the default number of time step intervals. When both numeric results are less than ten, ten becomes the default number of time-step intervals.
Temporal structure of the defined locations cube
The temporal structure of the defined locations cube must be specified by the user. If your data was collected every 5 years, you would specify that in the Time Step Interval parameter.
You may also choose to aggregate temporally in your defined locations cube. If you have stations that are recording moisture readings every 5 minutes, it may make sense to use Temporal Aggregation to combine those readings into hourly averages.
If temporal aggregation is chosen, you can assess the aggregation by mapping the number of features aggregated into each bin. For instance, if you have data collected every 5 minutes and you are aggregating into hourly averages, you would expect to see 12 features aggregated into each hour in each bin. If you use the Visualize Space Time Cube in 3D tool to map the Temporal Aggregation Count Cube Variable and see that you have several bins with values less than 12, that indicates that some of the moisture readings were not present. This is not necessarily a problem but valuable to understand if perhaps one of the sensors had a problem or if a location has too much missing data over time to be included in the analysis.
Time Step Alignment
When creating a defined locations cube with no temporal aggregation, the only consideration is choosing a Time Step Interval, Time Step Alignment, and Reference Time that ensures one and only one record falls within each bin. The issue of temporal bias is not present.
If you are not aggregating and want to create monthly time-step intervals and your data falls anywhere between the 1st of the month and the 6th of the month due to collection procedures, best practice is to choose the Reference Time option for Time Step Alignment and choose a date that ensures that 1 month forward and backward will include each data point. For example, if you have data on 1/1, 2/3, 3/2, 4/1, and 5/3, choosing a reference time on the first of any of the months in your dataset will ensure that all data is appropriately included in the resulting cube.
When you are aggregating your data into a space-time cube, Time Step Alignment is an important parameter to consider, because it determines where the aggregation will begin and end. See the following example:
The illustration above represents a dataset spanning from September 3, 2015 to September 12, 2015. We'll use this dataset to explore the implications of the different parameter options.
If an End time Time Step Alignment is chosen with a Time Step Interval of 3 days, for instance, the binning will initiate with the last data point and go back in 3-day increments until all data points fall within a time step.
It is important to note that, depending on the Time Step Interval that you choose, it is possible to create a time step at the beginning of the space-time cube that does not have data across the entire span of time. In the example above, you'll notice that 9/1 and 9/2 are included in the first time step even though no data exists until 9/3. These empty days are part of the time step but have no data associated with them. This can bias your results because it will appear that the temporally biased time step has significantly less points than other time steps, which is in fact an artificial result of the aggregation scheme. The report indicates whether there is temporal bias in the first or last time step. In this case, two out of the three days in the first time step have no data, so the temporal bias would be 66 percent.
End time is the default option for Time Step Alignment because many analyses are focused on what has happened most recently, so putting this bias toward the beginning of the cube is preferable. Another solution, which gets rid of the temporal bias all together, would be to provide data that is divided evenly by the Time Step Interval so that no time periods are biased. You can do this by creating a selection set of the data that excludes the part of the point dataset that falls outside of what you want to be the first time period. In this example, selecting all data except for those that fall before 9/4 would solve the problem. The report shows the time span of the first and last time steps, and that information can be used to determine where to make the cutoff.
It is also important to note that if, in the process of moving back in time, the final bin happened to land exactly on the first data point as its start, that final data point would not be included in that bin. This is because, with an End time Time Step Alignment, each bin includes the last date in a given bin, yet goes back to but does not include the first date in that bin. So, in this case, an additional bin would have to be added to ensure that the first data point was included.
If a Start time Time Step Alignment is chosen with a Time Step Interval of 3 days, for instance, then binning will start at the first data point and go in 3-day increments until the last data point falls within the final time step.
There are a few things that are important to note. One is that with a Start time Time Step Alignment based on the Time Step Interval that you choose, it is possible to create a time step at the end of the space-time cube that does not have data across the entire span of time. In the example above, you'll notice that 9/13 and 9/14 are included in the last time step even though no data exists after 9/12. These empty days are part of the time step but have no data associated with them. This can bias your results because it will appear that the temporally biased time step has significantly less points than other time steps, which is in fact an artificial result of the aggregation scheme. The report indicates whether there is temporal bias in the first or last time step. In this case, two out of the three days in the last time step have no data, so the temporal bias would be 66 percent. This is particularly problematic when choosing a Start time Time Step Alignment because analyses that are focused on the most recent data can be significantly impacted. The solution would be to provide data that is divided evenly by the Time Step Interval so that no time periods are biased. You can do this by creating a selection set of the data that excludes the part of the point dataset that falls outside of what you want to be the last time period. In this example, selecting all data except for those that fall after 9/11 would solve the problem. You could also choose to cut two days from the beginning of the dataset, which would also lead to the data falling evenly within the time steps. The report shows the time span of the first and last time steps, and that information can be used to determine where to make the cutoff.
It is also important to note that if, in the process of moving forward in time, the final time step happened to land exactly on the last data point as its end, that final data point would not be included in that bin. This is because with a Start time Time Step Alignment each bin includes the first date in a given bin, yet goes forward to but does not include the last date in that bin. So, in this case, an additional bin would have to be added to ensure that the last data point was included.
A Reference time Time Step Alignment allows you to ensure that a specific date marks the beginning or end of one of the time steps in the cube.
When you choose a Reference time that falls after the extent of the dataset, at the last data point, or in the middle of the dataset, it will be treated as the last data point of a time step and all other bins on either side will be created using a Time Step Alignment until all of the data is covered as illustrated below.
When you choose a Reference time that falls before the extent of the dataset or at the first data point, it will be treated as the first data point of a time step, and all other time steps on either side will be created using a Start time Time Step Alignment until all of the data is covered as illustrated below.
Note that choosing a Reference time before or after the extent of your data has the potential to create empty or partially empty bins, which can bias your analysis.
Template cubes for grid cubes
A Template Cube cannot be used with defined locations cubes. They are only applicable to grid cubes.
Choosing a Template Cube has implications for the Time Step Alignment. Let's take a look at a few examples. When you choose a Template Cube that falls before or after the time span of the Input Features, time steps will be added until all of the data is covered by a time step, using the Time Step Alignment of the Template Cube. The resulting space-time cube will have empty cubes wherever the Template Cube did not overlap the Input Features in time. This can bias the results of analysis. If the Template Cube overlaps the Input Features, the resulting space-time cube will cover the temporal extent of the Template Cube and extend until all Input Features are covered, using the Time Step Alignment of the Template Cube. The illustration below shows template cubes in blue and the resulting space-time cubes in orange.
It is important to note that when creating a new space-time cube using a Template Cube, the temporal extent of the Template Cube will be extended until all data is covered. This will allow you to use last year's cube to create a new cube that includes both last year's data and this year's data. The spatial extent of the Template Cube is treated differently. Any data falling outside of the spatial extent of the Template Cube will be dropped from the analysis. The Template Cube and the resulting space-time cube will have identical spatial extents. The only changes that can occur are within the spatial extent where locations that previously had no data can become locations with data if new features have appeared that were not present when the Template Cube was created.
When creating a cube by aggregating points, whether a grid cube or a defined locations cube, a COUNT of the number of points in each bin is always calculated. In addition to the COUNT, you can also summarize attributes within each bin. Multiple statistic and field combinations can be specified. Null values are excluded from all statistical calculations. When choosing Summary Fields, each location must have a value for each attribute at every time step. You can choose how the tool fills empty bins (bins that have no points, and thus no attribute values) using the Fill Empty Bins with parameter. Multiple options are available, and you can choose a different fill type for each field being summarized. Any bins that cannot be filled based on the estimation criteria will result in the whole location being excluded from the analysis. A minimum of 4 neighbors are required to fill empty bins using the average value of spatial neighbors, and a minimum of 13 neighbors are required to fill empty bins using the average value of space time neighbors.
When creating a cube from defined locations with no temporal aggregation, you choose the variables from your data that you want to include in the cube, and a Fill Empty Bins with option that is most appropriate if there are null values or missing features at particular time periods in the dataset and you do not want the locations to be dropped.
When creating a cube from defined locations with temporal aggregation, you must choose the Summary Fields that you want to include in the resulting cube and the Statistic type that will be used to summarize them. Because each location must have a value at every time step, in addition to choosing a Statistic type, you must also choose how to complete the time series using the Fill Empty Bins with parameter. Multiple options are available, and you can choose a different fill type for each field being summarized.
Statistic types (for all cubes)
The available statistic types are as follows:
- SUM—Adds the total value for the specified field within each bin.
- MEAN—Calculates the average for the specified field within each bin.
- MIN—Finds the smallest value for all records of the specified field within each bin.
- MAX—Finds the largest value for all records of the specified field within each bin.
- STD—Finds the standard deviation on values in the specified field within each bin.
- MEDIAN—Finds the sorted middle value of all records of the specified field within each bin.
Null values present in any of the Summary Fields will result in those features being excluded from analysis. If having the count of points in each bin is part of your analysis strategy, you may want to consider creating separate cubes, one for the count (without summary fields) and one for summary fields. If the set of null values is different for each summary field, you may also consider creating a separate cube for each summary field.
Fill Empty Bins with (for all cubes)
The available fill types are as follows:
- ZEROS—Fills empty bins with zeros.
- SPATIAL_NEIGHBORS—Fills empty bins with the average value of spatial neighbors.
- SPACE_TIME_NEIGHBORS—Fills empty bins with the average value of space time neighbors.
- TEMPORAL_TREND—Fills empty bins using an interpolated univariate spline algorithm.
Additionally, when using the Create Space Time Cube From Defined Locations tool, there is the additional option to DROP_LOCATIONS that do not have a complete time series rather than filling them using one of the above options.
In addition to the netCDF file, messages summarizing the space-time cube dimensions and contents are written at the bottom of the Geoprocessing pane during tool execution. You can access the messages by hovering over the progress bar, clicking the pop-out button , or expanding the messages section in the Geoprocessing pane. You can also access the messages for a previously run tool via the Geoprocessing History.
For grid cubes, only locations with data for at least one time-step interval will be included in the analysis, but they will be analyzed across all time steps. When computing point counts in a grid cube, zero counts are assumed for any bin where there are no points, but the associated location has had at least one point for at least one time-step interval. Information about the percentage of zeros associated with locations that have data for at least one time-step interval is reported in the messages as sparseness.
For defined locations, any location that has a complete time series will be included in the defined locations cube even if that time series is entirely comprised of zeros. This is particularly important to consider if you have aggregated points into defined locations.
At the end of the output message, there is information about the Overall Data Trend. This trend is based on an aspatial time-series analysis. The question it answers is, overall, are the events represented by the input increasing or decreasing over time? To obtain the answer, all locations in each time-step interval are analyzed together as a time series using the Mann-Kendall statistic.
The Mann-Kendall trend test is performed on every location with data as an independent bin time-series test. The Mann-Kendall statistic is a rank correlation analysis for the bin count or value and their time sequence. The bin value for the first time period is compared to the bin value for the second. If the first is smaller than the second, the result is a +1. If the first is larger than the second, the result is -1. If the two values are tied, the result is zero. The result for each pair of time periods compared are summed. The expected sum is zero, indicating no trend in the values over time. Based on the variance for the values in the bin time series, the number of ties, and the number of time periods, the observed sum is compared to the expected sum (zero) to determine if the difference is statistically significant or not. The trend for each bin time series is recorded as a z-score and a p-value. A small p-value indicates the trend is statistically significant. The sign associated with the z-score determines if the trend is an increase in bin values (positive z-score) or a decrease in bin values (negative z-score). Strategies for visualizing the trend results are provided in Visualizing the Space Time Cube.
You can visualize the space-time cube data in either 2D or 3D using the tools in the Utilities toolset or by downloading the Space Time Cube Explorer. The Space Time Cube Explorer was created to make visualizing and exploring your three dimensional Space Time Pattern Mining analysis results fast and easy. The add-in will take your space-time cube as input and create layers that can be visualized in a number of useful ways. There are many display options available, all with preset symbology and range and time sliders that make the exploration of the space-time cube and analysis results intuitive. The add-in is available to download at www.esriurl.com/SpaceTimeCubeExplorer. Three-dimensional visualizations of the space-time cube can also be displayed as web scenes and shared in story maps.
The creation, visualization, and analysis of the space-time cube takes advantage of netCDF software developed by UCAR/Unidata. You can learn more about Unidata and the Network Common Data Form (NetCDF) project here.
For information on histogram bin-width optimization, see the following:
- Shimazaki H. and Shinomoto S., A method for selecting the bin size of a time histogram in Neural Computation (2007) Vol. 19(6), 1503–1527.
- Terrell, G. and Scott, D., Oversmoothed Nonparametric Density Estimates. Journal of the American Statistical Association (1985) Vol. 80(389), 209-214.
- Online Statistics Education: A Multimedia Course of Study (http://onlinestatbook.com/). Project leader: David M. Lane, Rice University (chapter 2, "Graphing Distributions, Histograms").
For information on the Mann-Kendall trend test, see the following:
- Hamed, K. H., Exact distribution of the Mann-Kendall trend test statistic for persistent data in Journal of Hydrology (2009), 86–94.
- Kendall, M. G., Gibbons, J. D., Rank correlation methods, fifth ed., (1990) Griffin, London.
- Mann, H. B., Nonparametric tests against trend in Econometrica (1945) Vol. 13, 245–259.