Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:iot-reloaded:regression_models [2024/12/10 17:01] blankaen:iot-reloaded:regression_models [2024/12/10 21:33] (current) pczekalski
Line 32: Line 32:
   * β0 and β1 y axis crossing and slope coefficients of the linear function correspondingly   * β0 and β1 y axis crossing and slope coefficients of the linear function correspondingly
  
-Unfortunately, in the context of the given example, finding such a function is not possible for all x-y pairs at once since x and y values differ from pair to pair. However, finding a linear function that minimises the distance of the given y to the y' produced by the function or model for all x-y pairs is possible. In this case, y' is an estimated or forecasted y value. At the same time, the distance between each y-y' pair is called an error. Since the error might be positive or negative, a squared error is used to estimate the error. +Unfortunately, in the context of the given example, finding such a function is not possible for all x-y pairs at once since x and y values differ from pair to pair. However, finding a linear function that minimises the distance of the given y to the y' produced by the function or model for all x-y pairs is possible. In this case, y' is an estimated or forecasted y value. At the same time, the distance between each y-y' pair is called an error. Since the error might be positive or negative, a squared error estimates the error. 
 It means that the following equation might describe the model: It means that the following equation might describe the model:
  
Line 83: Line 83:
   * ei - error  of the model's ith output   * ei - error  of the model's ith output
  
-Since an error for a given yith might be positive or negative and the model itself minimises the overall error, one might expect that the error is normally distributed around the model, with a mean value of 0 and its sum close to or equal to 0. Examples of the error for a few randomly selected data points are depicted in the following figure {{ref>Galton's_data_set_errors}} in red colour:+Since an error for a given yith might be positive or negative and the model itself minimises the overall error, one might expect that the error is typically distributed around the model, with a mean value of 0 and its sum close to or equal to 0. Examples of the error for a few randomly selected data points are depicted in the following figure {{ref>Galton's_data_set_errors}} in red colour:
  
 <figure Galton's_data_set_errors> <figure Galton's_data_set_errors>
Line 107: Line 107:
 From this discussion, a few essential notes have to be taken: From this discussion, a few essential notes have to be taken:
   * Error distributions (around 0) should be treated as carefully as the models themselves;   * Error distributions (around 0) should be treated as carefully as the models themselves;
-  * In most cases, error distribution is hard to notice even if the errors are illustrated;+  * In most cases, error distribution is complex to notice even if the errors are illustrated;
   * It is essential to look into the distribution to ensure that there are no regularities.    * It is essential to look into the distribution to ensure that there are no regularities. 
 If any regularities are noticed, whether a simple variance increase or cyclic nature, they point to something the model does not consider. It might point to a lack of data, i.e., other factors that influence the modelled process, but they are not part of the model, which is therefore exposed through the nature of the error distribution. It also might point to an oversimplified look at the problem, and more complex models should be considered. In any of the mentioned cases, a deeper analysis should be considered.  If any regularities are noticed, whether a simple variance increase or cyclic nature, they point to something the model does not consider. It might point to a lack of data, i.e., other factors that influence the modelled process, but they are not part of the model, which is therefore exposed through the nature of the error distribution. It also might point to an oversimplified look at the problem, and more complex models should be considered. In any of the mentioned cases, a deeper analysis should be considered. 
Line 117: Line 117:
 </figure> </figure>
  
-Here, the error is considered to be normally distributed around 0, with its standard deviation sigma and variance sigma squared. Variance provides at least a numerical insight into the error distribution; therefore, it should be considered as an indicator for further analysis. Unfortunately, the true value of sigma is not known; therefore, its estimated value should be used:+Here, the error is considered to be normally distributed around 0, with its standard deviation sigma and variance sigma squared. Variance provides at least a numerical insight into the error distribution; therefore, it should be considered an indicator for further analysis. Unfortunately, the true value of sigma is not known; thus, its estimated value should be used:
  
 <figure Sigma> <figure Sigma>
Line 134: Line 134:
 ===== Multiple linear regression ===== ===== Multiple linear regression =====
  
-In many practical problems, the target variable Y might depend on more than one independent variable X, for instance, wine quality, which depends on its level of serenity, amount of sugars, acidity and other factors. In the case of applying a linear regression model, it seems much complicated, but it is still a linear model of  the following form:+In many practical problems, the target variable Y might depend on more than one independent variable X, for instance, wine quality, which depends on its level of serenity, amount of sugars, acidity and other factors. In the case of applying a linear regression model that doesn't seem very easy, but it is still a linear model of the following form:
  
 <figure Multiple linear model> <figure Multiple linear model>
Line 148: Line 148:
 </figure> </figure>
  
-Unfortunately, the results of multiple linear regression cannot be visualised in the same way as for a single linear regression due to the number of factors (dimensions). Therefore, numerical analysis and interpretation of the model should be done. In many situations, numerical analysis is complicated and requires a semantic interpretation of the data and model. To do it, visualisations reflecting the relation between the dependent variable and independent variables result in multiple graphs. Otherwise, the quality of the model is hardly assessable or even unassessable. +Unfortunately, due to the number of factors (dimensions), the results of multiple linear regression cannot be visualised in the same way as those of a single linear regression. Therefore, numerical analysis and interpretation of the model should be done. In many situations, numerical analysis is complicated and requires a semantic interpretation of the data and model. To do it, visualisations reflecting the relation between the dependent variable and independent variables result in multiple graphs. Otherwise, the quality of the model is hardly assessable or even unassessable. 
  
 ===== Piecewise linear models ===== ===== Piecewise linear models =====
en/iot-reloaded/regression_models.1733850091.txt.gz · Last modified: 2024/12/10 17:01 by blanka
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0