This is an old revision of the document!


DBSCAN

General audience classification iconGeneral audience classification iconGeneral audience classification icon

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) employs density measures to mark points in high-density regions and those in low-density regions – the noise. Because of this natural behaviour of the algorithm, it is particularly useful in signal processing and similar application domains. [1].

One of the essential concepts is the point's p neighbourhood, which is the set of points reachable within the user-defined distance eps (epsilon):

  Point's neighbourhood
Figure 1: Point's neighbourhood

, where:

  • p – the point of interest;
  • N(p) – neighbourhood of the point p;
  • q – any other point;
  • distance(p,q) – Euclidian distance between points q and p;
  • eps – epsilon – user-defined distance constant;
  • D – the initial set of points available for the algorithm;

The algorithm treats different points differently depending on density and neighbouring points distribution around the point – its neighbourhood:

  1. Core Points:
    • A point is a core point if it has at least MinPts neighbours within a distance eps, where MinPts and eps are user-defined parameters, i.e. N(p) ≥ MinPts.
  2. Directly Density-Reachable points:
    • A point is directly density-reachable from a core point if it lies within the distance eps of the core point.
  3. Border Points:
    • A border point is not a core point but within the eps distance of a core point. Border points are part of a cluster but do not have enough neighbours to be core points.
  4. Noise points:
    • Points that are not core and are not reachable from any core point are considered noise or outliers.
  DBSCAN concepts
Figure 2: DBSCAN concepts
  • The main steps in DBSCAN:
    • Selects a point from the data set – it might be the first in the data set or any randomly selected;
    • If it is a core point, form a cluster by grouping it with all directly density-reachable points.
    • Move to the next unvisited point and return to step 1.
    • Border points are added to the nearest cluster, and points that are not reachable from any core point are marked as noise.
  • Advantages:
    • Can detect clusters of arbitrary shape.
    • Naturally identifies outliers or noise.
    • Unlike K-means, DBSCAN does not require specifying the number of clusters upfront.
  • Disadvantages:
    • Results depend heavily on the choice of eps and MinPts.
    • It struggles with clusters of varying densities since eps is fixed.

DBSCAN is great for discovering clusters in data with noise, especially when clusters are not circular or spherical.


[1] Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. (eds.). A density-based algorithm for discovering clusters in large spatial databases with noise (PDF). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220. ISBN 1-57735-004-9.
en/iot-reloaded/dbscan.1727267909.txt.gz · Last modified: 2024/09/25 12:38 by agrisnik
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0