Finding Local Groupings of Time Series

Marco Trincavelli
4 min readSep 30, 2022

This blog post is a short summary of a paper submitted at ECML PKDD 2022, that arose from a collaboration between H&M AI Research and Stockholm University. If you find this interesting and would like to read the full text of the paper, with all the formulas and mathematical formalisms, here is a link to the article in the conference proceedings.

Introduction

Finding groupings of time series is relevant in several application domains like retail [1] or investing portfolio management [2], where multiple time series instances are collected and monitored. These groupings comprise sets of time series of high similarity over a time period (e.g., in terms of concurrently similar values or trends). Often, groupings do not span the whole time series (global groupings), but shorter time periods, defining local groupings. Furthermore, a subset of time series may persist being grouped together in consecutive local groupings over a longer time period, possibly separated by short time gaps, hence forming associations of local groupings, as shown in Figure 1. Note that local groupings and associations have different lengths and member instances.

Figure 1: Four time series with six associations denoted by different colors. Each association contains several local groupings separated by short time gaps denoted by darker/lighter colors.

Depending on the application and on the quantity measured by the time series groupings and associations can have different meaning like for example products that exhibit similar selling patterns in certain groups of stores at certain time of the year, if we consider retail related application.

Methodology

Z-Grouping, the algorithm we propose in this paper, is a four-step algorithm to first detect local groupings maximize the time span and the number of time series in the grouping and then to detect associations maximizing the number of common local groupings and time series in the association.

Figure 2: An illustration of the four steps of the Z-Grouping algorithm.

The four steps, displayed in Figure 2 are briefly described in the following:

  • Step 1 - Event Sequence Matrix Generation converts a collection time series into an event sequence matrix by applying a temporal abstraction function like SAX [3]. Event sequences are less sensitive to noise or outliers, insensitive to minor fluctuations, favoring the formation of patterns over time. Step 1 in Figure 3 generates a matrix with three types of events {a,b,c}.
  • Step 2 - Local Grouping Generation finds local groupings for applying semigeometric tiling [4] to each event label (channel). Roughly, semigeometric tiling maximizes the length and the number of time series in the local grouping, while allowing a maximum rate of impurity, i.e. time slots that do not belong to the selected event label. Step 2 in Figure 3 generates six local goruping candidates allowing maximum 25% impurity in each candidate.
  • Step 3 - Association Generation finds associations that maximize the number of common local groupings and time series members while keeping the pairwise distance between the raw time series below a predefined threshold. Step 3 in Figure 3 finds two asociations aggregating the six local groupings found in Step 2, namely γ1 with range [1,10] that contains {t2, t3, t4}, and γ2 with range [6, 10] that contains {t5, t6}.
  • Step 4 - Validation validates the obtained local groupings and associations by comparing them to a set of global groupings on the same time series collection T , obtained either by time series clustering or provided by a domain expert. The goal is to assess how related the global groupings are to local groupings and how can local groupings help us assess the similarity of these global groupings. To do so, a validity score based on the proportion of time series in each local grouping/association beloning to each global grouping is calculated (details in the paper).

Experiments & Results

We first tested Z-Grouping on a syntethic dataset for providing an extensive parameter investigation, then we benchmarked it on three real-world dataset coming from diverse domains (fashion retail, stock market, and COVID pandemics) and on 128 UCR datasets [5], excluding the cases where at least one algorithm cannot find any valid groupings (17 cases) and where thedataset length is shorter than the smallest window size parameter (6 cases). To the best of our knowledge, there is no other algorithm addressing the same problem solved by Z-Grouping, and therefore we benchmark on similar algorithms, namely semigeometric tiling [4], k-means with fixed time range, k-means with flexible time range, and k-NearestNeighbors.

Our experiments showed that Z-Grouping could achieve lower error rates than its competitors while successfully retrieving local groupings without size constraints on time ranges, which is infeasible by using traditional methods.

For a detailed description of the experimental protocol and the benckmark algorithms please refer to the original paper, an accurate description would be too lengthy for a blog post :)

The code of this paper including the synthetic data generator and few of the datasets are publicly available online [6].

References

[1] Jiang, Y., Liu, Y., Wang, H., Shang, J., Ding, S.: Online pricing with bundling and coupon discounts. IJPR 56(5), 1773–1788 (2018).

[2] Huang, C.F.: A hybrid stock selection model using genetic algorithms and support vector regression. Applied Soft Computing 12(2), 807–818 (2012).

[3] Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. DMKD 15(2), 107–144 (2007).

[4] Henelius, A., Karlsson, I., Papapetrou, P., Ukkonen, A., Puolamäki, K.: Semigeometric tiling of event sequences. In: ECML-PKDD. pp. 329–344. Springer (2016).

[5]Dau, H.A., Bagnall, A., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Keogh, E.: The ucr time series archive. JAS 6(6), 1293 1305 (2019).

[6] Z-Grouping repository, https://github.com/zedshape/zgrouping/

--

--