Margin-based Greedy Shapelet Search for Robust Time Series Classification of Imbalanced Data

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 2022, 00, pp. 5266-5274
Issue Date:
2022-01-01
Full metadata record
Many real-world big data applications in domains like finance, telecommunication and manufacturing rely on the detection of exceedingly rare patterns in large time series data sets. In principle, machine learning models can be trained to detect and classify such patterns. However, these models often lack the necessary robustness for practical applications and do not generalize well in production. Additionally, their intransparent decision-making hampers systematic debugging and improvement. Time series shapelets are a popular data mining primitive that can be used to extract a shape-based feature representation from the data. However, existing algorithms do not adequately consider the robustness and redundancy of these features. While these drawbacks can be compensated if sufficient labeled data is available, this is not possible for highly imbalanced data. We propose alterations to the current state-of-the-art shapelet algorithm that consider the margin of separation and the multivariate dependencies between the extracted features. This results in more robust and diverse features, which in turn translates to higher classification accuracy. We compare our algorithm to the current state-of-the-art using a public benchmark data set. Additionally, we showcase its applicability to highly imbalanced data using a suitable data set from the manufacturing domain.
Please use this identifier to cite or link to this item: