This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
en:iot-reloaded:introduction_to_time_series_analysis [2023/10/31 08:44] – created margus | en:iot-reloaded:introduction_to_time_series_analysis [2025/05/13 14:59] (current) – [A cooling system case] pczekalski | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Introduction to Time Series Analysis ====== | ====== Introduction to Time Series Analysis ====== | ||
+ | As discussed in the data preparation chapter, time series usually represent the dynamics of some process. Therefore, the order of the data entries has to be preserved. As emphasised, a time series is simply a set of data—usually events—arranged by a time marker. Typically, time series are placed in the order in which events occur/are recorded. | ||
+ | |||
+ | In the context of IoT systems, there might be several reasons why time series analysis is needed. | ||
+ | The most widely used ones are the following: | ||
+ | * **Process dynamics forecasting** for higher-performing decision support systems. An IoT system, coupled with appropriate cloud computing or other computing infrastructure, | ||
+ | * **Anomaly detection** is a highly valued feature of IoT systems. In its essence, anomaly detection is a set of methods enabling the recognition of unwanted or abnormal behaviour of the system over a specific time period. Anomalies might be expressed in data differently: | ||
+ | * **A certain event in time:** for instance, a measurement jumps over a defined threshold value. This is the simplest type of anomaly, and most control systems cope with it by setting appropriate threshold values and alerting mechanisms. | ||
+ | * **Change of a data fragment shape:** This might happen to technical systems, where a typical response to control inputs has changed to some shape that is not anticipated or planned. A simple example is an engine' | ||
+ | * **Event density:** Many technical systems' | ||
+ | * **Event value distribution: | ||
+ | | ||
+ | Due to its diversity, various algorithms might be used in anomaly detection, including those covered in previous chapters. For instance, clustering for typical response clusters, regression for normal future states estimation and measuring the distance between forecast and actual measurements, | ||
+ | ====== | ||
+ | * **Understanding of system dynamics**, where the system owner is interested in having insightful information on the system functioning to make good decisions on its control or further development. Typical applications are system monitoring, the production of dashboards, different industrial research, and the study of system prototypes. | ||
+ | |||
+ | |||
+ | While most of the methods covered here might be employed in time series analysis, this chapter outlines anomaly detection and classification cases through an industrial cooling system example. | ||
+ | |||
+ | ===== A cooling system case ===== | ||
+ | |||
+ | A given industrial cooling system has to maintain a specific temperature mode of around -18C. Due to the specifics of the technology, it goes through a defrost cycle every few hours to avoid ice deposits, leading to inefficiency and potential malfunction. However, a relatively short power supply interruption has been noticed at some point, which needs to be recognised in the future for reporting appropriately. The logged data series is depicted in the following figure {{ref> | ||
+ | |||
+ | |||
+ | |||
+ | <figure Cooling_system> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | It is easy to notice that there are two standard behaviour patterns: defrost (small spikes), temperature maintenance (data between spikes) and one anomaly – the high spike. | ||
+ | |||
+ | One possible alternative for building a classification model is to use K-nearest neighbours (KNN). Whenever a new data fragment is collected, it is compared to the closest ones and applies a majority principle to determine its class. In this example, three behaviour patterns are recognised; therefore, a sample collection must be composed for each pattern. It might be done by hand since, in this case, the time series is relatively short. | ||
+ | |||
+ | Examples of the collected patterns (defrost on the left and temperature maintenance on the right) are present in figure {{ref> | ||
+ | |||
+ | <figure Example_patterns> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | Unfortunately, | ||
+ | |||
+ | <figure Anomaly_pattern> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | A data augmentation technique might be applied to overcome data scarcity, where several other samples are produced from the given data sample. This is done by applying Gaussian noise and randomly changing the sample' | ||
+ | |||
+ | <figure Data_collection> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | One might notice that: | ||
+ | * Samples of different patterns are different in length. | ||
+ | * Samples of the same pattern are of different lengths. | ||
+ | * The interesting phenomena (spikes) are located at different locations within the samples and are slightly different. | ||
+ | The abovementioned issues expose the problem of calculating distances from one example to another since comparing data points will produce misleading distance values. To avoid it, a Dynamic Time Warping | ||
+ | |||
+ | Once the distance metric is selected and the initial dataset is produced, the KNN might be implemented. The closest ones can be determined using DTW by providing the " | ||
+ | |||
+ | <figure Single_query> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | For practical implementation, | ||
+ | |||
+ | <figure Multiple_test_queries> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | As might be noticed, the query (black) samples are somewhat different from the ones found to be " | ||
+ | The same idea demonstrated here might be used for unknown anomalies by setting a similarity threshold for DTW, classifying known anomalies as shown here, or even simple forecasting. | ||