This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:iot-reloaded:classification_models [2024/12/02 21:30] – [Interpretation of the model output] ktokarz | en:iot-reloaded:classification_models [2024/12/10 21:38] (current) – pczekalski | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Decision | + | ====== Decision |
Line 9: | Line 9: | ||
Classification is used in almost all domains of modern data analysis, including medicine, signal processing, pattern recognition, | Classification is used in almost all domains of modern data analysis, including medicine, signal processing, pattern recognition, | ||
- | <WRAP excludefrompdf> | ||
- | Within this chapter, two very widely used algorithm groups are discussed: | ||
- | |||
- | * [[en: | ||
- | * [[en: | ||
- | </ | ||
===== Interpretation of the model output ===== | ===== Interpretation of the model output ===== | ||
- | The classification process consists of two steps: first, an existing data sample is used to train the classification model, and then, in the second step, the model is used to classify unseen objects, thereby predicting to which class the object belongs. As with any other prediction, in classification, | + | The classification process consists of two steps: first, an existing data sample is used to train the classification model, and then, in the second step, the model is used to classify unseen objects, thereby predicting to which class the object belongs. As with any other prediction, in classification, |
Depending on a particular output, several cases might be identified: | Depending on a particular output, several cases might be identified: | ||
* True positive (TP) – the object belongs to the class and is classified as a class member. | * True positive (TP) – the object belongs to the class and is classified as a class member. | ||
- | **Example: | + | **Example: |
* False positive (FP) – the object that does not belong to the class is classified as a class member. | * False positive (FP) – the object that does not belong to the class is classified as a class member. | ||
Line 45: | Line 39: | ||
The classification model is trained using the initial sample data, which is split into training and testing subsamples. Usually, the training is done using the following steps: | The classification model is trained using the initial sample data, which is split into training and testing subsamples. Usually, the training is done using the following steps: | ||
- | - The sample is split into training and testing subsamples; | + | - The sample is split into training and testing subsamples. |
- | - Training subsample is used to train the model; | + | - Training subsample is used to train the model. |
- | - Test subsample is used to acquire accuracy statistics as described earlier; | + | - Test subsample is used to acquire accuracy statistics as described earlier. |
- | - Steps 1 – 3 are repeated several times (usually at least 10 – 25) to acquire average model statistics; | + | - Steps 1 – 3 are repeated several times (usually at least 10 – 25) to acquire average model statistics. |
The average statistics are used to describe the model. | The average statistics are used to describe the model. | ||
- | The model' | + | The model' |
Unfortunately, | Unfortunately, | ||
Line 57: | Line 51: | ||
===== Random sample ===== | ===== Random sample ===== | ||
- | < | + | < |
- | {{ : | + | {{ : |
- | < | + | < |
</ | </ | ||
- | Most of the data is used for training in random sample cases, and only a few randomly selected samples are used to test the model. The procedure is repeated many times to ensure the model' | + | Most of the data is used for training in random sample cases (figure {{ref> |
===== K-folds ===== | ===== K-folds ===== | ||
<figure K-folds> | <figure K-folds> | ||
- | {{ : | + | {{ : |
< | < | ||
</ | </ | ||
- | This approach splits the training set into smaller sets called splits (in the figure above, there are three splits). Then, for each split, the following steps are performed: | + | This approach splits the training set into smaller sets called splits (in the figure |
- | * Model is trained using k-1 folds; in the picture above, every split (row) is divided into k folds, where in sequence, split by the split, an i-th fold is used for testing, while the k-1 folds for training, | + | * Model is trained using k-1 folds; in the picture above (figure {{ref> |
- | * The model’s accuracy is assessed using the remaining fold for each split iteratively; | + | * The model's accuracy is assessed |
The overall performance for the k-fold cross-validation is the average performance of the individual performances computed for each split. It requires extra computing but respects data scarcity, which is why it is used in practical applications. | The overall performance for the k-fold cross-validation is the average performance of the individual performances computed for each split. It requires extra computing but respects data scarcity, which is why it is used in practical applications. | ||
===== One out ===== | ===== One out ===== | ||
- | < | + | < |
- | {{ : | + | {{ : |
- | < | + | < |
</ | </ | ||
- | This approach splits the training set into smaller sets called splits in the same way as previous methods described here (in the figure above, there are three splits). Then, for each split, the following steps are performed: | + | This approach splits the training set into smaller sets called splits in the same way as previous methods described here (in the figure |
- | * The model is trained using n-1 samples, and only one sample is used for testing the model’s performance. | + | * The model is trained using n-1 samples, and only one sample is used for testing the model's performance. |
* The overall performance for the one-out cross-validation is the average performance of the individual performances computed for each split. It requires extra computing but respects data scarcity, which is why it is used in practical applications. | * The overall performance for the one-out cross-validation is the average performance of the individual performances computed for each split. It requires extra computing but respects data scarcity, which is why it is used in practical applications. | ||
This method requires many iterations due to the limitations of the testing set. | This method requires many iterations due to the limitations of the testing set. | ||
+ | |||
+ | <WRAP excludefrompdf> | ||
+ | Within the following sub-chapters, | ||
+ | |||
+ | * [[en: | ||
+ | * [[en: | ||
+ | </ |