Analyze the data spatially

You'll examine the check-in data and analyze it to determine spatial trends.

Open the project

First, you'll download and open an ArcGIS Pro project package that contains a map of the check-in data. Then, you'll become familiar with the data's attributes.

  1. Download the Bay Area Popular Places project package.
  2. Locate the downloaded Bay_Area_Popular_Places project package on your computer and double-click it to open it in ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account or ArcGIS Enterprise account.
    Note:

    If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.

    Default map and data

    The project contains a map with point data in the San Francisco Bay Area. The data was collected via the Gowalla social media platform, which was active between 2007 and 2012. Gowalla allowed users to check in at locations they visited. Each point represents a location where a Gowalla user checked in.

    Based on the map, answer the following questions:

    • Do certain places contain more check-ins than others?
    • How might you define an area to be popular using these check-ins?
    • The data is densely clustered. How much insight can you gain just from looking at the map?

    Next, you'll investigate the data's attributes.

  3. In the Contents pane, right-click the Bay Area Gowalla Check-ins layer and choose Attribute Table.

    Attribute Table option for the Bay Area Gowalla Check-ins layer

    The table appears.

    Attribute table for the Bay Area Gowalla Check-ins layer

    The User ID and Location ID fields contain unique IDs for users and locations. You don't have access to a key for these IDs, so these fields aren't useful to determine popularity. The Check-in Latitude and Check-in Longitude fields provide the data's spatial information, while the Check-in Time field provides its temporal information.

  4. Close the table.

Change the coordinate system

When analyzing the spatial relationships between features, it's important to ensure that you're using a coordinate system that is appropriate for the data. A projected coordinate system is a mathematical process that transforms the three-dimensional world into a two-dimensional map. Because there is no perfect way to make this transformation, all projected coordinate systems contain some form of distortion. This distortion not only affects the map's appearance but can also alter the results of spatial analysis.

To reduce distortion and ensure the highest accuracy of results, you'll project the data to a projected coordinate system that is focused around the San Francisco area. This coordinate system minimizes distortion near San Francisco, at the cost of increasing distortion in other areas. Because you're not focused on areas outside of San Francisco, this coordinate system is appropriate for your map and data.

  1. On the ribbon, click the Analysis tab. In the Geoprocessing group, click Tools.

    Tools button on the Analysis tab

    The Geoprocessing pane appears.

  2. In the Geoprocessing pane, on the search bar, type Project. In the list of results, click the Project tool to open it.

    Project tool in the Geoprocessing pane

  3. In the Project tool pane, for Input Dataset or Feature Class, choose Bay Area Gowalla Check-ins. For Output Dataset or Feature Class, type Check_ins_Projected.
  4. For Output Coordinate System, click the Select coordinate system button.

    Select coordinate system button

  5. In the Coordinate System window, in the search box, type San Francisco and press Enter.
  6. Expand Projected Coordinate System and County Systems. Click NAD 1983 (2011) San Francisco CS13 (US Feet).

    Coordinate System window with new coordinate system selected

  7. Click OK. In the Geoprocessing pane, click Run.

    The output layer named Bay Area Gowalla Check-ins is added to the map.

  8. In the Contents pane, right-click the second Bay Area Gowalla Check-ins layer (the original one) and choose Remove.

    Remove option for the original check-ins layer

    The layer is removed. Although you've projected the layer, the map's appearance hasn't changed. The map is still using the original projected coordinate system, which is focused on the United States as a whole (meaning California, on the edge of the United States, is somewhat distorted). You'll update the map's projection.

  9. In the Contents pane, double-click Map.

    The Map Properties window appears.

  10. In the Map Properties window, click Coordinate Systems. Search for San Francisco. Expand County Systems and choose the NAD 1983 (2011) San Francisco CS13 (US Feet) coordinate system.
  11. Click OK.

    The map changes to use the selected coordinate system.

    Map updates with different coordinate system

Aggregate check-ins

It's difficult to determine which areas are popular by looking at the map because almost every populated place in the San Francisco Bay Area is covered with check-in points. To gain more meaningful insight, you'll count the number of check-ins in each area. You'll create a grid of hexagon bins that covers the San Francisco Bay Area and use this grid to aggregate check-ins. Then, you'll symbolize the result layer to determine which areas have the most check-ins.

  1. In the Geoprocessing pane, click the Back button.

    Back button

  2. Search for and open the Generate Tessellation tool.

    This tool creates a grid of regular polygon features, such as hexagons, squares, or triangles, to cover a specified extent.

  3. For Output Feature Class, type Hexagon_Tessellation. For Extent, choose Bay Area Gowalla Check-ins.

    Extent parameter for the Generate Tessellation tool

  4. For Size, type 12 and choose Square Statute Miles. For Spatial Reference, confirm that NAD_1983_2011_San_Francisco_CS13_ftUS is chosen.

    Parameters for the Generate Tessellation tool

  5. Click Run.

    The tools runs and a hexagon grid is added to the map. (Its default symbology is random and may differ from the example image.)

    Hexagon tessellation on map

    Next, you'll count the number of check-ins within each hexagon bin. You're not interested in areas where no check-ins were made or where no data was collected, so first you'll select the bins that intersect at least one check-in.

    When running a geoprocessing tool on a layer that has an active selection, such as your hexagon grid, the tool only uses the selected features for analysis. Features that aren't selected won't be used in the analysis.

  6. On the ribbon, click the Map tab. In the Selection group, click Select By Location.

    Select By Location button

    The Select Layer By Location window appears.

  7. In the Select Layer By Location window, enter the following parameters:
    • For Input Features, confirm Hexagon_Tessellation is selected.
    • For Relationship, confirm Intersect is selected.
    • For Selecting Features, choose Bay Area Gowalla Check-ins.

    Parameters for the Select Layer By Location tool

  8. Click OK. In the Contents pane, uncheck Bay Area Gowalla Check-ins to turn it off.

    On the map, hexagon bins that intersect at least one check-in are selected.

    Map showing selected hexagon bins

    Next, you'll join the check-in features to the selected hexagons. The join will add an attribute field to the hexagon grid that includes the number of check-ins within each hexagon.

  9. In the Geoprocessing pane, click the Back button. Search for and open the Spatial Join tool.
  10. In the Spatial Join tool, enter the following parameters:
    • For Target Features, choose Hexagon_Tessellation.
    • For Join Features, choose Bay Area Gowalla Check-ins.
    • For Output Feature Class, type Check_in_Counts.

    Parameters for the Spatial Join tool

  11. Click Run.

    The tool runs and a new layer, containing only the selected hexagon bins, is added to the map. The check-in counts for each bin are contained in an attribute field for the layer. To visualize the counts on the map, you'll change the layer symbology.

  12. In the Contents pane, right-click Hexagon_Tessellation and choose Remove. Turn off the Bay Area Gowalla Check-ins layer.
  13. Right-click Check_in_Counts and click Symbology.

    The Symbology pane appears.

  14. In the Symbology pane, for Primary symbology, choose Graduated Colors.
  15. For Classes, choose 10. For Color scheme, choose Cyan to Purple.

    Parameters for the Symbology pane

    The symbology is applied to the map.

    Map with symbolized hexagon bins

    On the map, pink hexagon bins have a higher number of check-ins, while blue bins have a lower number. The bins with more check-ins tend to be grouped around San Francisco and San Jose, the largest cities in the area.

  16. Close the Symbology pane. On the Quick Access Toolbar, click the Save button.

    Save button on the Quick Access Toolbar

    Note:

    A message may appear warning you that saving this project file with the current ArcGIS Pro version will prevent you from opening it again in an earlier version. If you see this message, click Yes to proceed.

Quantify the significance of the aggregations

Your aggregated check-ins demonstrate some patterns. But are these patterns statistically significant, or could they be caused by random variance or sampling error? To find out, you'll quantify the statistical significance of the aggregated check-ins. You'll use the Global Moran's I statistic to test whether the patterns in your results are clustered, dispersed, or random.

Global Moran's I quantifies the spatial patterns of an attribute. Because your original check-in data has no attributes that you can use to determine the density of check-ins, it was necessary to aggregate the check-ins before running the statistic. The hexagon bins have the Join_Count field, which Global Moran's I can quantify.

Note:

To learn more about the math behind Global Moran's I, see How Spatial Autocorrelation (Global Moran's I) works.

  1. In the Geoprocessing pane, click the Back button. Search for and open the Spatial Autocorrelation (Global Moran's I) tool.
  2. In the Spatial Autocorrelation (Global Moran's I) tool, for Input Feature Class, choose Check_in_Counts, and for Input Field, choose Join_Count.
  3. Check Generate Report.

    Parameters for the Spatial Autocorrelation (Global Moran's I) tool

  4. Click Run.

    The tools runs, but no layer is added to the map. Instead, a report file was created. You can find the path to this report file by viewing information about the tool.

  5. At the bottom of the Geoprocessing pane, click View Details.

    View Details

    The Spatial Autocorrelation (Global Moran's I) window appears. This window lists the tool's run time, the parameters you used to run the tool, and any warning messages.

  6. In the Spatial Autocorrelation (Global Moran's I) window, click the Parameters tab. For Report File, click the path to the report file.

    Report File path link in the Parameters tab

    The report file appears on a new browser tab.

    Report file

    The report includes the Moran's Index, the z-score, and the p-value. For determining statistical significance, the z-score is the most important of these values.

    The z-score indicates the number of standard deviations a value is from the average value. Positive z-scores are values above the average, while negative z-scores are values below the average. In this case, the value being measured is the amount of spatial autocorrelation that exists between features in your dataset.

    Your data's z-score is over 7, meaning that your data has significantly more spatial autocorrelation when compared to a hypothetical collection of randomly distributed data. The report also contains a chart that plots the z-score on the far right end of a bell curve. The chart indicates that there is statistical significance to the distribution of your data, and that it is clustered (meaning similar values in the data are closer together).

  7. Close the report. In ArcGIS Pro, close the Spatial Autocorrelation (Global Moran's I) window.

Detect spatial clusters

By aggregating the data and determining its statistical significance, you know with confidence that check-ins are not randomly distributed, but clustered. Next, you'll perform spatial cluster analysis to detect areas of high popularity.

  1. In the Geoprocessing pane, click the Back button. Search for and open the Density-based Clustering tool.

    This tool provides three methods for spatial clustering, with each method requiring a different definition of what is considered dense and not dense. You'll run the tool three times, one for each method, and weigh the advantages and disadvantages of each.

    First, you'll use the defined distance method, also known as DBSCAN, which is the simplest density-based clustering method. In this method, density is defined by having a specified number of points within a specified distance. At every point, it checks whether the point satisfies the minimum number of features within a set search distance. If a point satisfies this criterion, it is marked as a clustered point. To run the tool, you must define the minimum number of features. You can also define the search distance, but if you do not set a search distance, the tool uses an optimized value.

    The minimum number of features per cluster depends on your data and the problem you want to solve. You want to identify popular places in the Bay Area. You don't know an exact number of check-ins that makes a place popular, but you can define a number based on your business' circumstances. For instance, assume you want to open a night club in the Bay Area and plan to charge fees that will require at least 500 customers per day to make a profit. In this example, you can define the minimum number of features per cluster to be 500. You can set the search distance to about 0.1 miles, roughly the size of a city block.

  2. In the Density-based Clustering tool, enter the following parameters:
    • For Input Point Features, choose Bay Area Gowalla Check-ins.
    • For Output Features, type DBSCAN_500.
    • For Clustering Method, choose Defined distance (DBSCAN).
    • For Minimum Features per Cluster, type 500.
    • For Search Distance, type 0.1 and choose US Survey Miles.

    Parameters for the Density-based Clustering tool using the DBSCAN method

  3. Click Run.

    The tool runs and the result layer is added to the map.

  4. In the Contents pane, turn off the Check_in_Counts layer.

    Map showing the DBSCAN result layer

    On the map, the colored points represent dense clusters of check-in points. The gray points represent noise, or any location that did not fit your definition of dense.

    The legend provides insight on the symbology:

    Legend for the DBSCAN layer

    Density-based clustering can find hundreds of clusters in a dataset. Rather than symbolize each cluster with a different color, eight distinct colors are used. The results are displayed so that clusters with similar colors are not close together, making the distinctions between clusters clearer on the map. The colors do not correspond to any attribute in the data.

    On the map, clusters are primarily located in San Francisco and the South Bay, with a few clusters located in other places. You'll change the basemap and zoom in to learn more.

  5. On the ribbon, on the Map tab, in the Layer group, click Basemap and choose Imagery Hybrid.

    Imagery Hybrid basemap

  6. Zoom in to San Francisco.

    Clusters in San Francisco

    San Francisco contains several clusters, including a particularly large blue cluster to the northeast. This cluster is downtown San Francisco.

  7. Pan to the northeast, across the bay, until you see Berkeley.

    Clusters in Berkeley

    Berkeley contains a single cluster, located in the center of the city.

  8. Pan to the south of the bay until you see Palo Alto.

    Clusters in Palo Alto

    Palo Alto and the surrounding area contain a few clusters. The Stanford Shopping Center (orange) and downtown Palo Alto (pink) are detected as clusters.

  9. Pan to the southeast until you see San Jose.

    Clusters in San Jose

    San Jose is the most populous city in the Bay Area, even more than San Francisco. However, it contains fewer clusters than San Francisco.

  10. In the Contents pane, right-click Bay Area Gowalla Check-ins and choose Zoom To Layer.

    The map extent returns to show the entire Bay Area.

    Overall, there are only a few clusters outside of San Francisco. One of the limitations of the DBSCAN method of clustering is that it uses a fixed distance for determining density. (When you ran the tool, you set this distance to 0.1 miles.) The distance chosen can significantly influence results. While a smaller distance may be appropriate for areas such as downtown San Francisco, where stores and other points of interest are close together, it may not be appropriate for suburban or rural areas where stores are more spread out.

    Your study area encompasses cities, suburbs, and rural areas, so using a single fixed distance may not provide the best results. Next, you'll perform density-based clustering using the self-adjusting method, also called HDBSCAN.

    HDBSCAN detects clusters at multiple search distances, similar to if DBSCAN were run multiple times. At each search distance, it detects different clusters in different locations. Then, DBSCAN attempts to merge these clusters to create larger clusters that all have a similar point density. The resulting clusters are not defined by a single search distance.

  11. In the Density-based Clustering tool pane, for Output Features, type HDBSCAN_500. For Clustering Method, choose Self-adjusting (HDBSCAN).

    Parameters for the Density-based Clustering tool using the HDBSCAN method

    The tool no longer requires a search distance.

  12. Click Run. After the tool finishes running (it may take 10 minutes or so), turn off the DBSCAN_500 layer.

    Map showing the HDBSCAN result layer

    Compared to the DBSCAN method, the HDBSCAN method detects more clusters. Clusters appear all over the Bay Area, including in rural areas, and some of these clusters are large enough to cover entire cities, such as the clusters in Santa Rosa or Vallejo. While these results indicate more popular locations in the Bay Area, the results may not be enough to pinpoint the best place to open a new business.

    Next, you'll use the third spatial clustering method, multi-scale (also called OPTICS).

    The OPTICS method records the distance between the first feature in a dataset (Order ID 0) and its nearest neighbor. This distance is called the reachability distance. Then, the method records the reachability distance between the nearest neighbor and its nearest neighbor. This process repeats continually until the entire dataset has been covered. No nearest neighbor is repeated; if one feature's nearest neighbor was also the nearest neighbor of a previous feature, the next nearest neighbor is used instead.

    Then, the OPTICS method charts all of the reachability distances and looks for peaks and valleys in the chart. A valley, or a group of features with relatively low reachability distances, is a cluster of points that are close together. Once all of the points in a cluster are charted, the next point, which is not part of the cluster, will have a relatively high reachability distance, corresponding to a peak in the chart.

    The following diagram shows an example reachability chart and the corresponding clusters of points:

    Diagram showing the OPTICS clustering method

    In this example, all of the blue points are close together, so there is a low reachability distance between them. (The red lines represent the reachability distance from point to point.) On the chart, these points correspond to the blue valley. Then, there is a relatively large distance between the last blue point and its next unique nearest neighbor, corresponding to a sharp increase in reachability distance on the chart.

    In the green valley, there is a peak that is relatively small compared to the two larger peaks on either side of the valley. Depending on the OPTICS algorithm's cluster sensitivity, this small peak may divide the valley in two or it may still be considered part of the valley.

  13. In the Geoprocessing pane, for Output Features, type OPTICS_500. For Clustering Method, choose Multi-scale (OPTICS).

    Parameters for the Density-based Clustering tool using the HDBSCAN method

    This method requires a search distance. By default, the search distance is set to the previous distance you used, 0.1 miles. This method also has an optional parameter, Cluster Sensitivity. You'll learn more about this parameter later. For now, you'll leave it blank.

  14. Click Run. After the tool finishes running, turn off the HDBSCAN_500 layer.
    Tip:

    Now that you've added a few layers to your map, it may be useful to collapse the legends of layers you're not using to make them easier to find in the Contents pane. To collapse a legend, click the arrow next to the layer name.

    Map showing the OPTICS result layer

    The results of this clustering method are similar to the results of the DBSCAN method. The OPTICS method is similar to the DBSCAN method, but the OPTICS method accounts for clusters of varying densities by relying on relative peaks and valleys rather than absolute distances.

    What the method considers a peak and a valley relies on its cluster sensitivity. You didn't set a cluster sensitivity, so the tool uses a sensitivity value based on the statistical spread of the data. You'll view the tool details to see what sensitivity was used.

  15. At the bottom of the Geoprocessing pane, click View Details.

    The Density-based Clustering window appears with information about the cluster sensitivity value used.

    Parameters for the Density-based Clustering tool

    The tool used a cluster sensitivity of 28. (The sensitivity value is always an integer between 0 and 100.) You'll run the tool again with different cluster sensitivities and see how the results change.

  16. Close the Density-based Clustering window. In the Density-based Clustering tool pane, change Output Features to OPTICS_500_Sensitivity_0 and for Cluster Sensitivity, type 0.

    Density-based Clustering pane updated to 0 Cluster Sensitivity

  17. Click Run. After the tool finishes running, turn off OPTICS_500 and zoom to San Francisco.
    Tip:

    To better view the resulting clusters, on the Contents pane, uncheck Hybrid Reference Layer.

    OPTICS results with 0 sensitivity

    At this sensitivity, the clusters are relatively large.

  18. In the Density-based Clustering tool pane, change Output Features to OPTICS_500_Sensitivity_100 and change Cluster Sensitivity to 100. Click Run.
  19. After the tool finishes, turn off the OPTICS_500_Sensitivity_0 layer.

    The OPTICS_500_Sensitivity_0 layer, with a higher sensitivity, resulted in smaller, more compact clusters.

    OPTICS results with 100 sensitivity

    For your problem, locating a popular place where you can open a business, using a higher sensitivity is probably more useful. While a lower sensitivity may help you delineate broader areas of popularity, higher sensitivities indicate places where high numbers of check-ins are occurring—in other words, where people are actually going.

  20. Turn off the OPTICS_500_Sensitivity_100 layer, turn on the Bay Area Gowalla Check-ins layer, and zoom to the full extent of the data. Change the basemap back to Topographic.
  21. Save the project.

You've analyzed your data spatially. Through aggregation and spatial clustering, you've determined locations where there are particularly high densities of check-ins and learned some of the ways to adjust your analysis results depending on your specific objectives.

Your data has another component, which you haven't yet looked at: time. Next, you'll analyze your data temporally to determine popular places in the Bay Area.


Analyze the data temporally

Your data has both a spatial and temporal component. Analyzing spatial trends is useful, but it doesn't tell the entire story. After all, which places are popular can change over time, especially in dense urban downtowns where new stores open and close frequently. It would be much better to open your business in a location that is gaining popularity rather than losing it.

Convert the time field

The Check-in Time field contains the date and time when a check-in was created. However, the field contains a concatenated string of text that ArcGIS Pro does not automatically recognize as a time stamp. To use this field for temporal analysis, you'll convert it to a recognized data field format.

  1. If necessary, open your Bay Area Popular Places project in ArcGIS Pro.
  2. In the Geoprocessing pane, search for and open the Convert Time Field tool.

    This tool converts time and date values from a text string to a date field.

  3. In the Convert Time Field tool pane, for Input Table, choose Bay Area Gowalla Check-ins. For Input Time Field, choose Check-in Time.

    Next, you'll set the input time format (the format that the field currently uses). The format is written using letters to represent different units of time, such as y for year and H for hour. The format used in the table is yyyy-MM-ddTHH:mm:ssZ, with the T and the Z being constants that do not reflect any units of time.

  4. For Input Time Format, type yyyy-MM-ddTHH:mm:ssZ.
    Tip:

    To set the parameter, you can either type the format or click the Set Format button and choose from a list of formats. The format used by the Check-in Time field isn't one of the listed formats, so in this instance, typing the format is required.

    Parameters for the Convert Time Field tool

    You'll leave the other parameters unchanged.

  5. Click Run.

    The tool runs.

  6. In the Contents pane, right-click Bay Area Gowalla Check-ins and click Attribute Table.

    The Check_in_Time_Converted field has been added to the end of the table with the converted check-in times.

    Converted check-in time field

  7. Close the table.

Chart the temporal data

Your feature class contains time data that ArcGIS Pro can process and analyze. Next, you'll create a data clock. Data clocks are a type of chart that summarize temporal data. You'll use this chart to find patterns in the times people checked in.

  1. In the Contents pane, right-click Bay Area Gowalla Check-ins, point to Create Chart, and choose Data Clock.

    Data Clock chart type in the list of charts

    The Bay Area Gowalla Check-ins - Data Clock 1 view and the Chart Properties pane appear. To create the chart, you'll change parameters in the pane. You'll create a chart that visualizes the total number of check-ins by year and month.

  2. In the Chart Properties pane, for Date, choose Check_in_Time_Converted. Confirm that Rings is set to Years, Wedges is set to Months, and Aggregation is set to Count.

    Date variable in the Chart Properties pane

    The data clock is created.

    Default data clock chart

    In this data clock, each concentric circle (ring) represents a year, while each circle segment (wedge) represents a month. The color of each wedge represents the total number of check-ins made during that month, with darker blue colors corresponding to more check-ins. Gray wedges have no data.

    Your data clock has two rings: 2009 and 2010. Check-in data was first collected in March 2009 and last collected in October 2010. There were a low number of check-ins until late 2009 as the Gowalla service adopted more users. The months with the highest number of check-ins were March, April, August, and September 2010.

  3. In the Chart Properties pane, for Rings, choose Weeks. For Wedges, choose Days of the Week.

    The data clock updates.

    Data clock chart with weeks and days of the week

    The data clock contains significantly more rings, but only seven wedges in each ring, one for each day of the week. Based on this data clock, the weekend days (Saturday and Sunday) have the highest number of check-ins. This pattern makes sense, as most people don't have to work on the weekends and thus have more leisure time to visit places.

    Depending on the type of business you plan to start, you may also be interested in the times of day that check-ins occurred. Visualizing a year's worth of hourly data would be difficult, so you'll create a feature class that only contains a subset of the data and create a chart for it.

  4. In the Chart Properties pane, change Rings to Years and Wedges to Months. On the data clock, press Ctrl while clicking the August 2010 and September 2010 wedges to select them.

    Data clock with August and September 2010 wedges selected

    Tip:

    Another way to select multiple wedges is to draw a box around them.

    All check-ins made during the selected dates are also selected on the map.

    Selected check-ins on the map

    In ArcGIS Pro, any geoprocessing tools run on a dataset will only be run on the selected features, if a selection has been made. Next, you'll copy the selected features to a new dataset.

  5. Open the Geoprocessing pane and click the Back button. Search for and open the Copy Features tool.
  6. In the Copy Features tool pane, for Input Features, choose Bay Area Gowalla Check-ins. For Output Feature Class, type Check_ins_Aug_Sep_2010.

    Parameters for the Copy Features tool

  7. Click Run.

    The copied feature class is added to the map.

  8. In the Contents pane, right-click Check_ins_Aug_Sep_2010, point to Create Chart, and choose Data Clock.

    A new data clock is created.

  9. In the Chart Properties pane, for Date, choose Check_in_Time_Converted. For Rings, choose Days, and for Wedges, choose Hours.

    The data clock automatically updates with 24 wedges, one for each hour of the day.

    Data clock chart with days and hours

    Few people checked in during early business hours, with especially low counts in the hours between 6 a.m. and 2 p.m. The highest volume of check-ins occurred between 7 p.m. and 9 p.m. and between 1 a.m. and 2 a.m. These trends may be indicative of a high influx of customers to restaurants during the evening or night clubs late at night.

  10. Close the Counts of Check_in_Time_Converted by Hours over Days data clock. In the Contents pane, right-click Check_ins_Aug_Sep_2010 and choose Remove.

    For your subsequent analysis, you'll only work with the check-in data between December 2009 and September 2010, the 10 months when check-ins were at their highest. Using this subset of the data in subsequent analysis will remove records from when the social media app was still gaining users. These periods of low use may skew results.

  11. In the Counts of Check_in_Time_Converted by Months over Years data clock, press Ctrl while selecting the months from December 2009 through September 2010.

    Data clock with months from December 2009 through September 2010 selected

  12. Close the data clock. Save the project.

Analyze trends with a space time cube

The charts you created helped you understand trends in the number of check-ins throughout the entire dataset. But what if you wanted to analyze trends that were both temporal and spatial? Which neighborhoods have the highest number of check-ins? Are certain neighborhoods becoming more or less popular over time? Answering these questions can be vital when deciding where to open a new business.

To analyze the spatial and temporal elements of your data together, you'll need to create a spatiotemporal data structure (a data structure that accounts for both space and time). This data structure will summarize the check-in points by a fixed area and a fixed increment of time.

You'll use the Create Space Time Cube tool to define a spatiotemporal data structure for your data. The resulting dataset can be thought of as a cube because it has three dimensions: two for area (x and y) and a third for time (t).

  1. In the Geoprocessing pane, click the Back button. Search for Create Space Time Cube.

    The search returns three results for Create Space Time Cube.

    Search results for Create Space Time Cube

    The tool that you choose depends on your data. Your check-in data comes from a variety of point locations across space, so you want to aggregate points. If your data instead relied on stations or other locations with fixed geographies (such as traffic cameras or toll booths), you would create a space time cube from defined locations. If your data came from a multidimensional raster layer, you would choose the appropriate tool.

  2. Click Create Space Time Cube By Aggregating Points.
  3. For Input Features, choose Bay Area Gowalla Check-ins. For Output Space Time Cube, type Check_ins_STC.

    After you type the output name, the .nc extension is automatically added to the end. This extension stands for netCDF, which is the file type used by space time cubes.

  4. For Time Field, choose Check_in_Time_Converted.

    Input and output parameters for the Create Space Time Cube tool

    Next, you'll choose the interval of time to aggregate points, or the time bin. The time bin interval should be appropriate for the time scale relevant to your analysis. You want to know if there have been any long-term trends in neighborhood popularity, so an hourly or daily bin would not be useful. Instead, you'll use a monthly interval. (If you planned to open a business that saw increased activity during specific hours of the day, such as a coffee shop, you might be more interested in an hourly bin to see which places are more popular during those times.)

  5. For Time Step Interval, type 1 and choose Months.

    You'll also choose the shape of the area for spatial aggregation. You'll use a hexagonal aggregation area, because hexagons have the highest number of spatial neighbors (6) of the available shapes. Additionally, in a hexagon grid, all neighboring hexagons are a constant distance away. Later, you'll define spatiotemporal neighborhoods by distance, so hexagons will have an advantage over a fishnet (square) grid, where some neighbors are farther away than others.

    Diagram showing square and hexagon grid neighbors

    You'll set these hexagons to be 1 mile wide.

  6. For Aggregation Shape Type, choose Hexagon grid. For Distance Interval, type 1 and choose US Survey Miles.

    Time and distance interval parameters for the Create Space Time Cube tool

  7. Click Run.

    The tool runs and creates a space time cube file. No outputs are added to the map. To visualize the space time cube, you'll run another tool.

  8. Click the Back button. Search for and open the Visualize Space Time Cube in 2D tool.

    This tool creates a 2D layer based on an .nc file.

  9. In the Visualize Space Time Cube in 2D tool, for Input Space Time Cube, click the Browse button.

    Browse button

  10. In the Input Space Time Cube window, open the p20 folder. Double-click Check_ins_STC.nc.
  11. Change the following parameters:
    • For Cube Variable, choose COUNT.
    • For Display Theme, choose Trends.
    • Check Enable Time Series Pop-ups.
    • For Output Features, type Check_ins_STC_2D.

    Parameters for the Visualize Space Time Cube in 2D tool

    These parameters will map the trends in monthly check-in counts. By enabling time series pop-ups, you can view a time series for each bin showing counts over time.

  12. Click Run.

    The tool runs and the layer is added to the map.

  13. In the Contents pane, turn off the Bay Area Gowalla Check-ins layer. On the map, zoom in to San Francisco and click a purple hexagon bin.

    Time series pop-up

    The pop-up contains a time series chart showing the number of check-ins over time at that location. Although there may be some decreases over time, generally there is a strong increasing trend in the purple bins.

    The numbers on the vertical axis of the time series chart indicate the number of check-ins. The hexagon in the example image has gone from about 160 check-ins per month to about 360.

  14. Click a green hexagon.

    Time series chart for a hexagon with a downward trend

    Green hexagons are those where a downward trend was detected. Many of these hexagons have low counts of check-ins overall. In the example image, the area decreased from a high of over 900 check-ins to a low of under 600. Even though the trend is decreasing, even the lowest values for this area are higher than the highest values for the area where the trend was increasing.

    White hexagons are areas where no trend was detected, either up or down. These hexagons may have either stable numbers of check-ins per month or highly erratic numbers.

  15. Close the pop-up and return to the full extent of the data.

    When you analyzed the data spatially, you found that downtown San Francisco was the most popular area. However, much of downtown San Francisco shows no upward or downward trend in popularity. On the other hand, areas in San Jose or the East Bay are growing in popularity. It may be worthwhile to consider these areas as places to open your business.

    Next, you'll visualize the space time cube in 3D, which will make it easier to see changes in time on the map. (Time is the third dimension in a space time cube.) First, you'll insert a new scene.

  16. On the ribbon, on the Insert tab, in the Project group, click the New Map drop-down arrow and choose New Local Scene.

    New Local Scene option

    A scene view is added to the project.

  17. In the Geoprocessing pane, click the Back button. Search for and open the Visualize Space Time Cube in 3D tool.
  18. In the Visualize Space Time Cube in 3D tool, change the following parameters:
    • For Input Space Time Cube, browse to the Check_ins_STC.nc file.
    • For Cube Variable, choose COUNT.
    • For Display Theme, choose Value.
    • For Output Features, type Check_ins_STC_3D.

    Parameters for the Visualize Space Time Cube in 3D tool

  19. Click Run.

    The tool runs and the result layer is added to the scene.

  20. Pan, zoom, and tilt the scene to explore the results.
    Tip:

    To tilt, press V and drag the map. To pan, press C and drag the map.

    Visualization of the space time cube in 3D

    In this visualization, each hexagon bin has a height composed of segments, with each segment corresponding to a different month. The color of each segment indicates the number of check-ins in that area during that month.

    Unlike the 2D visualization, each segment is symbolized by total count of check-ins, not by increasing or decreasing trends. As you saw in your spatial analysis, downtown San Francisco has the highest count of check-ins, even if it's not an area that is gaining popularity. Most bins in other locations have few check-ins and are symbolized white.

  21. Save the project.

Detect temporal clusters

Next, you'll detect temporal clusters of check-ins in your space time cube. Temporal clustering is similar to spatial clustering in that it identifies locations where features are densely grouped. The only difference is that temporal clustering groups clusters by temporal proximity instead of spatial proximity.

  1. Above the scene, click the Map tab.

    Map view tab

    You return to the Map view.

  2. In the Geoprocessing pane, click the Back button. Search for and open the Time Series Clustering tool.
  3. In the Time Series Clustering tool, for Input Space Time Cube, browse to and choose Check_ins_STC.nc. For Analysis Variable, choose COUNT, and for Output Features, type Check_ins_Monthly_Time_Clusters.

    You can also cluster the data by one of three characteristics of interest. You'll learn about the other characteristics later, but for now, you'll cluster so that locations with similar values across time are clustered together.

  4. For Characteristic of Interest, choose Value.

    You can also set the number of clusters that the tool creates. If left unchanged, the tool will use an optimal number based on the data. You'll create three clusters, corresponding to groups of high, medium, and low popularity.

  5. For Number of Clusters, type 3. Check Enable Time Series Pop-ups.

    You'll also create an output table so you can chart the results.

  6. For Output Table for Charts, type Clustering_Tables.

    Parameters for the Time Series Clustering tool

  7. Click Run. After the tool finishes, turn off the Check_ins_STC_2D layer.

    The clusters layer appears in the map.

    Map showing temporal clusters

    The hexagon bins are clustered into three groups: blue, red, and green. To find out what these clusters mean, you'll open the chart you created with the tool.

  8. In the Contents pane, under Standalone Tables, double-click Average Time Series per Cluster. (You may need to scroll to see it.)

    Average Time Series per Cluster chart option

    The chart appears.

    Chart showing the average time series per cluster

    Note:

    The colors assigned to each bin are assigned randomly and yours may differ from the example images. Regardless of color, the numbers are the same and the data tells the same story.

    In the Average Time Series per Cluster chart shown above, the blue hexagons are locations that historically have few check-ins. (They all have had at least one check-in, or they wouldn't be included at all.) The green hexagons are locations with more check-ins however, although the check-in counts are high, the number of check-ins fluctuates significantly from month to month. On the map, only one green hexagon was identified (in downtown San Francisco). These fluctuations may be due to seasonal variations in tourism. The red cluster contains downtown locations that may be frequented by locals, resulting in relatively consistent popularity throughout the year.

  9. On the map, zoom to downtown San Francisco and click the green hexagon.
    Note:

    The hexagon color may differ on your screen. Click the hexagon that is a different color from the others around it.

    Green hexagon in San Francisco

    The pop-up shows the time series chart at that location. The dotted green line shows the average number of check-ins for hexagons in the green cluster.

  10. Close the pop-up and the chart.

    You identified clusters of locations with similar numbers of check-ins over time. You can also identify clusters of areas with similar temporal trends. For instance, say that two areas encounter similar increases and decreases in check-ins over time due to seasonal changes in tourism. However, one of these areas has a significantly higher total number of check-ins than the other. When clustering based on value, these areas are not clustered together. But when clustering based on profile, they are.

    Clustering locations by profile is useful for businesses that intend to target a specific seasonal crowd. Profile clustering can be accomplished by one of two methods. You'll use the Fourier Family-based time series clustering method. The Fourier method identifies areas with different changes to popularity throughout the year.

  11. In the Time Series Clustering tool, for Output Features, type Check_ins_Monthly_Time_Clusters_Fourier. For Characteristic of Interest, choose Profile (Fourier).

    You can ignore certain characteristics of your time series when running the tool. You'll ignore the Range characteristic (in this case, the count of check-ins). This way, you'll identify locations with similar popularity trends regardless of the absolute number of check-ins. You'll also allow the tool to determine the best number of clusters to create.

  12. For Time Series Characteristics to Ignore, check Range. For Number of Clusters, type 3.
  13. Check Enable Time Series Pop-ups.
  14. For Output Table for Charts, type Clustering_Tables_Fourier.

    Parameters for the Time Series Clustering tool using the Fourier method

  15. Click Run. When the tool finishes running, turn off the Check_ins_Monthly_Time_Clusters layer.

    The clusters layer appears in the map.

    Map showing the results of the Fourier method

    There are many more hexagons of each color when using Profile (Fourier).

  16. In the Contents pane, under Clustering_Tables_Fourier, double-click Average Time Series per Cluster.

    Chart showing the results of the Fourier method

    In this chart, red corresponds to hexagons with more check-ins, especially during the spring. Blue corresponds to hexagons with fewer check-ins throughout the year, Green corresponds to hexagons with increasing check-ins. Each cluster type can be found throughout the Bay Area, rather than being tied to areas that have more check-ins in general (such as downtown San Francisco).

  17. Close the chart and save the project.

You've analyzed temporal trends in your data to find locations that are becoming more popular over time and locations with seasonal cycles in popularity. You're one step closer to fully understanding your data and being able to make an informed decision about where to open your new business.


Complete your analysis

Over the course of this tutorial, you've analyzed your data spatially and temporally. Depending on which statistical method you choose to detect clusters in your data, your results may change significantly. Next, you'll combine your results and reach a decision about where to open your business.

Detect spatial and temporal hot spots

Your final analysis will examine the data spatially and temporally at the same time. Using the Emerging Hot Spot Analysis (EHSA) tool, you'll classify patterns in your space time cube into one of 17 possible categories.

Unlike time-series clustering, EHSA determines whether a space time cube bin's neighbors contain a number of check-ins that are significantly higher than (hot spot) or lower than (cold spot) the global average. Once every location in the space time cube has been designated a hot spot, cold spot, or neither, EHSA examines variations in each location's z-score over time to determine whether the location is a consecutive, intensifying, diminishing, or sporadic hot or cold spot.

The final result accounts for both spatial and temporal variations in the data.

  1. If necessary, open your Bay Area Popular Places map in ArcGIS Pro.
  2. In the Geoprocessing pane, search for and open the Emerging Hot Spot Analysis tool. Enter the following parameters:
    • For Input Space Time Cube, browse to and choose Check_ins_STC.nc.
    • For Analysis Variable, choose COUNT.
    • For Output Features, type Check_ins_Emerging_Hot_Spots.
    • For Neighborhood Distance, type 1 and choose Miles.

    Parameters for the Emerging Hot Spot Analysis tool

    For each location, EHSA will examine every neighboring location within a mile to perform its analysis. You previously created a space time cube with a hexagon grid, which is ideal for neighborhood analysis because each hexagon is equidistant.

  3. Click Run. When the tool finishes, turn off the Check_ins_Monthly_Time_Clusters_Fourier layer.

    Emerging hot spots on the map

    Hot spots are located in downtown San Francisco, as well as several smaller cities south of the bay, such as Palo Alto, Mountain View, and San Jose. Most of the hot spots in downtown San Francisco are persistent hot spots, meaning they have been hot spots consistently across time. The other areas are mostly either new hot spots, meaning they were hot spots only at the end of the time series, or sporadic hot spots, meaning they have been hot spots some times but not others.

    Note that areas that were characterized with high and medium count clusters by time series clustering show up as consecutive hot spots. This implies the vicinity of these areas is higher than the average number of check-ins for the Bay Area for most of the time steps. In other words, these areas were more popular than the rest of the Bay Area for most of the time steps in the space time cube. Unlike San Francisco, these areas appear to be increasing in popularity over time.

    You can also visualize the results in 3D.

  4. In the Contents pane, right click the Check_ins_Emerging_Hot_Spots layer and choose Copy. Above the map, click the Scene tab to return to your scene.
  5. In the Contents pane, right-click Scene and choose Paste.

    The hot spots layer appears in the scene.

    Emerging hot spot results in scene

    Now that you've run EHSA on your space time cube, you can visualize based on the results of the analysis.

  6. In the Geoprocessing pane, click the Back button. Search for and open the Visualize Space Time Cube in 3D tool and enter the following parameters:
    • For Input Space Time Cube, browse to and choose Check_ins_STC.nc.
    • For Cube Variable, choose COUNT.
    • For Display Theme, choose Hot and cold spot results.
    • For Output Features, type Check_ins_STC_Hot_Spots.
  7. Click Run.
  8. Turn off the Check_ins_STC_3D layer. Explore the scene.

    Emerging hot spots on the scene

    In areas considered new hot spots, only the most recent month (the uppermost hexagon bin on the column) is considered a hot spot. Sporadic hot spots alternate between being hot spots and not being hot spots. In downtown San Francisco, areas are hot spots during every month, making them persistent hot spots.

  9. Click the Map tab to return to the Map view.

    When you ran EHSA, you chose a neighborhood distance of 1 mile. Changing the neighborhood distance also changes your results.

  10. In the Geoprocessing pane, click the Back button. Search for and open the Emerging Hot Spot Analysis tool and enter the following parameters:
    • For Input Space Time Cube, browse to and choose Check_ins_STC.nc.
    • For Analysis Variable, choose COUNT.
    • For Output Features, type Check_ins_Emerging_Hot_Spots_5mi.
    • For Neighborhood Distance, type 5 and choose US Survey Miles.
  11. Click Run. When the tool finishes running, turn off the Check_ins_Emerging_Hot_Spots layer.

    Map showing hot spots with a 5-mile neighborhood

    When a larger neighborhood size is used, larger areas are considered hot spots.

Decide where to open your business

Next, you'll determine the best location to open your new business. To do so, you'll overlay your spatial clusters, temporal clusters, and emerging hot spots. The criteria for how you combine these layers will depend on what you consider the ideal conditions for your business.

First, you'll select areas with dense spatial clusters of check-ins. These areas indicate high foot traffic, which is good for a new business. You performed spatial cluster analysis using three different methods: DBSCAN, HDBSCAN, and OPTICS. Of the three, HDBSCAN was the most appropriate to your study area, as it accounted for differences in population between the Bay Area's urban, suburban, and rural locations.

  1. On the ribbon, on the Map tab, in the Selection group, click Select By Attributes.

    When you performed cluster analysis, the result layers included the Cluster ID attribute field. In this field, any feature with a value of -1 was not a cluster. You'll select all areas that were clusters.

  2. In the Select By Attributes window, for Input Rows, choose HDBSCAN_500. Under Expression, create the expression Cluster ID is not equal to -1.

    Clause parameters

  3. Click Apply. Turn off the Check_ins_Emerging_Hot_Spots_5mi layer and turn on the HDBSCAN_500 layer.

    All areas indicated as clusters are selected.

    Selected clusters

    Next, you'll remove the clause you just executed and select locations that are a new, consecutive, or persistent hot spot.

  4. In the Select By Attribute tool, click Remove Clause.

    Remove Clause button

  5. For Input Rows, choose Check_ins_Emerging_Hot_Spots.
  6. Create the expression Where Pattern Type COUNT includes the value(s) Consecutive Hot Spot, New Hot Spot, Persistent Hot Spot.

    Parameters to select hot spots in the EHSA layer

  7. Click Apply. Turn off the HDBSCAN_500 layer and turn on the Check_ins_Emerging_Hot_Spots layer.

    The hot spots are selected.

    Selected hot spots on the map

    Next, you'll select monthly time clusters that see an increase in traffic during a specific season. Depending on the kind of business you plan to open, areas with more traffic in different seasons may be ideal. For the purposes of this exercise, you'll select areas with more traffic during the summer.

  8. In the Select Layer by Attribute tool, remove the expression. For Input Rows, choose Check_ins_Monthly_Time_Clusters_Fourier.

    In this layer, the temporal cluster that corresponds to high traffic patterns in the summer months is the green cluster, which has an ID of 3.

  9. Create the expression Time-Series Cluster ID is equal to 3.

    Parameters to select areas that are more popular during the summer

  10. Click OK. Turn off the Check_ins_Emerging_Hot_Spots layer and turn on the Check_ins_Monthly_Time_Clusters_Fourier layer.

    Selected clusters

    You've selected areas based on three criteria. Next, you'll create a layer that contains only hexagon bins that are selected in all three layers (meaning they adhere to all three criteria).You could adjust the criteria, add more criteria, or remove criteria, depending on the specific needs of your business. For the purposes of this exercise, three criteria is enough.

  11. In the Geoprocessing pane, click the Back button. Search for and open the Intersect tool.
    Note:

    Depending on your version of ArcGIS Pro, you may get a message to use the Pairwise Intersect tool for enhanced functionality. In this case, you cannot use that tool as it takes a maximum of two inputs and you have three.

  12. For Input Features, choose HDBSCAN_500. In the next row, choose Check_ins_Emerging_Hot_Spots, and in the next row, choose Check_ins_Monthly_Time_Clusters_Fourier.
    Note:

    To choose more than two input features, you must have an ArcGIS Pro Advanced license.

    Messages appear below each input feature, explaining that these layers have active selections.

  13. For Output Feature Class, type Ideal_Locations. For Attributes To Join, choose Only feature IDs.

    Parameters for the Intersect tool

  14. Click Run. After the tool finishes running, turn off the Check_ins_Monthly_Time_Clusters_Fourier layer.

    Ideal locations on the map

    The ideal locations can be found in San Francisco, Mountain View, and San Jose.

  15. Zoom to the various points in the map.

    Your analysis has identified some areas in San Francisco that would be ideal locations to open a business.

    While many points in Mountain View have been identified, these points are all grouped around a single area: Mountain View's downtown. If you wanted an alternative to San Francisco (perhaps because costs are too high), this area would be ideal.

  16. Return to the full extent of the data. Save the project.

In this tutorial, you performed spatiotemporal data science to identify popular places in the Bay Area across space and time. Based on your results, you determined several ideal locations to open your business, as well as the advantages and limitations of various methods of spatial and temporal aggregation.

You can find more tutorials in the tutorial gallery.