Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
en:iot-reloaded:dbscan [2024/09/25 12:32] – created agrisniken:iot-reloaded:dbscan [2024/12/10 20:44] (current) pczekalski
Line 6: Line 6:
 One of the essential concepts is the point's p neighbourhood, which is the set of points reachable within the user-defined distance eps (epsilon): One of the essential concepts is the point's p neighbourhood, which is the set of points reachable within the user-defined distance eps (epsilon):
  
-<figure Point's neighbourhood> +{{ :en:iot-reloaded:ClusterEq3.png?400 |  Point'Neighbourhood}}
-{{ {{ :en:iot-reloaded:ClusterEq3.png?300 |  Point'neighbourhood}} +
-<caption> Point's neighbourhood </caption> +
-</figure>+
  
-where: +where: 
-  * **p** – the point of interest; +  * **p** – the point of interest. 
-  * **N(p)** – neighbourhood of the point p; +  * **N(p)** – neighbourhood of the point p. 
-  * **q** – any other point; +  * **q** – any other point. 
-  * **distance(p,q)** – Euclidian distance between points q and p; +  * **distance(p,q)** – Euclidian distance between points q and p. 
-  * **eps** – epsilon – user-defined distance constant; +  * **eps** – epsilon – user-defined distance constant. 
-  * **D** – the initial set of points available for the algorithm;+  * **D** – the initial set of points available for the algorithm.
  
 The algorithm treats different points differently depending on density and neighbouring points distribution around the point – its neighbourhood: The algorithm treats different points differently depending on density and neighbouring points distribution around the point – its neighbourhood:
  
-  - Core Points:+  - **Core Points:**
     * A point is a core point if it has at least MinPts neighbours within a distance eps, where MinPts and eps are user-defined parameters, i.e. N(p) ≥ MinPts.     * A point is a core point if it has at least MinPts neighbours within a distance eps, where MinPts and eps are user-defined parameters, i.e. N(p) ≥ MinPts.
-  - Directly Density-Reachable points: +  - **Directly Density-Reachable points:** 
-    * A point is directly density-reachable from a core point if it lies within the distance eps of the core point +    * A point is directly density-reachable from a core point if it lies within the distance eps of the core point. 
-  - Border Points: +  - **Border Points:** 
-  - Noise points:+    * A border point is not a core point but within the eps distance of a core point. Border points are part of a cluster but do not have enough neighbours to be core points. 
 +  - **Noise points:** 
 +    * Points that are not core and are not reachable from any core point are considered noise or outliers. 
 + 
 +<figure DBSCANconcepts> 
 +{{ :en:iot-reloaded:DBSCAN.png?400 |  DBSCAN Concepts}} 
 +<caption> DBSCAN Concepts </caption> 
 +</figure> 
 + 
 +  * **The main steps in DBSCAN:** 
 +    * Selects a point from the data set – it might be the first in the data set or any randomly selected. 
 +    * If it is a core point, form a cluster by grouping it with all directly density-reachable points. 
 +    * Move to the next unvisited point and return to step 1. 
 +    * Border points are added to the nearest cluster, and points not reachable from any core point are marked as noise. 
 + 
 + 
 +  * **Advantages:** 
 +    * Can detect clusters of arbitrary shape. 
 +    * Naturally identifies outliers or noise. 
 +    * Unlike K-means, DBSCAN does not require specifying the number of clusters upfront. 
 + 
 +  * **Disadvantages:** 
 +    * Results depend heavily on the choice of eps and MinPts. 
 +    * It struggles with clusters of varying densities since eps is fixed. 
 + 
 +DBSCAN is excellent for discovering clusters in data with noise, especially when clusters are not circular or spherical. 
 + 
 +Some application examples (figures {{ref>DBSCANexample1}} and {{ref>DBSCANexample2}}): 
 + 
 +<figure DBSCANexample1> 
 +{{ :en:iot-reloaded:Clustering_6.png?600 |  DBSCAN Example}} 
 +<caption> DBSCAN Example: Eps = 1.0, 13 clusters and 96 noise points </caption> 
 +</figure> 
 + 
 +<figure DBSCANexample2> 
 +{{ :en:iot-reloaded:Clustering_7.png?600 |  DBSCAN Example}} 
 +<caption> DBSCAN Example: Eps = 1.5, 3 clusters and 8 noise points </caption> 
 +</figure> 
 + 
 +A typical application in signal processing (figure {{ref>DBSCANexample3}}): 
 + 
 +<figure DBSCANexample3> 
 +{{ :en:iot-reloaded:Clustering_8.png?600 |  DBSCAN Example}} 
 +<caption> DBSCAN Example: Eps = 0.2, 3 Clusters and 84 Noise Points </caption> 
 +</figure> 
 + 
 +==== Selecting eps and MinPts values ==== 
 + 
 +Usually, MinPts is selected using some prior knowledge of the data and its internal structure. If it is done, the following steps might be applied: 
 +  * Calculate the average distance between every point and its k-nearest neighbours, where k = MinPts. 
 +  * The average distances are sorted and depicted on a chart, where x – is the index of the sorted average distance, y – is the distance value. 
 +  * The optimal eps value is when y increases rapidly, as shown in the following picture (figure {{ref>Selecting_MinPts}}) on artificial sample data. 
 + 
 +<figure Selecting_MinPts> 
 +{{ :en:iot-reloaded:Clustering_9.png?400 |  Selecting MinPts}} 
 +<caption> Selecting MinPts </caption> 
 +</figure> 
 + 
 +The red horizontal line shows a possible eps value, around 0,045.  
 + 
 + 
en/iot-reloaded/dbscan.1727267528.txt.gz · Last modified: 2024/09/25 12:32 by agrisnik
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0