This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:iot-open:data:data_processing_models_frameworks [2018/09/04 07:58] – pczekalski | en:iot-open:data:data_processing_models_frameworks [2020/07/20 09:00] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== IoT data processing models | + | ===== ===== |
+ | <box # | ||
+ | <box # | ||
+ | ===== IoT Data Processing Models | ||
+ | <box # | ||
+ | <box # | ||
Processing frameworks and processing engines are responsible for computing over data in a data system. While there is no authoritative definition setting apart " | Processing frameworks and processing engines are responsible for computing over data in a data system. While there is no authoritative definition setting apart " | ||
Line 17: | Line 22: | ||
Stream processing systems compute over data as it enters the system. It requires a different processing model than the batch paradigm. Instead of defining operations to apply to an entire dataset, stream processors determine processes that will be used to each individual data item as it passes through the system. The datasets in stream processing are considered " | Stream processing systems compute over data as it enters the system. It requires a different processing model than the batch paradigm. Instead of defining operations to apply to an entire dataset, stream processors determine processes that will be used to each individual data item as it passes through the system. The datasets in stream processing are considered " | ||
- | * The total dataset is only defined as the amount of data that has entered the system so far. | + | * the total dataset is only defined as the amount of data that has entered the system so far; |
- | * The working dataset is perhaps more relevant and is limited to a single item at a time. | + | * the working dataset is perhaps more relevant and is limited to a single item at a time; |
- | * Processing | + | * processing |
Stream processing systems can handle a nearly unlimited amount of data, but they only process one (true stream processing) or very few (micro-batch processing) items at a time, with a minimal state being maintained in between records. While most systems provide methods of maintaining some state, stream processing is highly optimised for more functional processing with few side effects. | Stream processing systems can handle a nearly unlimited amount of data, but they only process one (true stream processing) or very few (micro-batch processing) items at a time, with a minimal state being maintained in between records. While most systems provide methods of maintaining some state, stream processing is highly optimised for more functional processing with few side effects. | ||
Line 27: | Line 32: | ||
This type of processing lends itself to certain kinds of workloads. Processing with near real-time requirements is well served by the streaming model. Analytics, server or application error logging, and other time-based metrics are a natural fit because reacting to changes in these areas can be critical to business functions. Stream processing is a good fit for data where you must respond to changes or spikes and where you're interested in trends over time. | This type of processing lends itself to certain kinds of workloads. Processing with near real-time requirements is well served by the streaming model. Analytics, server or application error logging, and other time-based metrics are a natural fit because reacting to changes in these areas can be critical to business functions. Stream processing is a good fit for data where you must respond to changes or spikes and where you're interested in trends over time. | ||
- | * Apache Storm | + | * Apache Storm. |
- | * Apache Samza | + | * Apache Samza. |
===Hybrid Processing Systems=== | ===Hybrid Processing Systems=== | ||
Line 34: | Line 39: | ||
Some processing frameworks can handle both batch and stream workloads. These frameworks simplify diverse processing requirements by allowing the same or related components and APIs to be used for both types of data. The way that this is achieved varies significantly between Spark and Flink, the two frameworks we will discuss. It is mainly a function of how the two processing paradigms are brought together and what assumptions are made about the relationship between fixed and unfixed datasets. While projects focused on one processing type may be a close fit for specific use-cases, the hybrid frameworks attempt to offer a general solution for data processing. They not only provide methods for processing over data, but they also have their integrations, | Some processing frameworks can handle both batch and stream workloads. These frameworks simplify diverse processing requirements by allowing the same or related components and APIs to be used for both types of data. The way that this is achieved varies significantly between Spark and Flink, the two frameworks we will discuss. It is mainly a function of how the two processing paradigms are brought together and what assumptions are made about the relationship between fixed and unfixed datasets. While projects focused on one processing type may be a close fit for specific use-cases, the hybrid frameworks attempt to offer a general solution for data processing. They not only provide methods for processing over data, but they also have their integrations, | ||
- | * Apache Spark | + | * Apache Spark. |
- | * Apache Flink | + | * Apache Flink. |