Differences

This shows you the differences between two versions of the page.

--- en:iot-open:data:data_lifecycle [2019/05/25 19:29] – irena.skarda
+++ en:iot-open:data:data_lifecycle [2020/07/20 09:00] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+=====  =====
+<box #5374d5></box>
+<box #5374d5></box>
 =====IoT Data Lifecycle=====
+=====  =====
+<box #5374d5></box>
+<box #5374d5></box>
 Data processing is simply the conversion of raw data to meaningful information through a process. Data is manipulated to produce results that lead to a resolution of a problem or improvement of an existing situation. Similar to a production process, it follows a cycle where inputs (raw data) are fed to a process (computer systems, software, etc.) to produce output (information and insights). Generally, organisations employ computer systems to carry out a series of operations on the data to present, interpret, or obtain information. The process includes activities like data entry, summary, calculation, storage, etc. A useful and informative output is presented in various appropriate forms, such as diagrams, reports, graphics, etc.
@@ Line 13: / Line 19: @@
   - **Delivery**: as data is filtered, aggregated, and possibly processed either at the concentration points or at the autonomous virtual units within the IoT, the results of these processes may need to be sent further up the system, either as final responses or for storage and in-depth analysis. Wired or wireless broadband communications may be used there to transfer data to permanent data stores.
   - **Preprocessing**: the IoT data will come from different sources with varying formats and structures. Data may need to be preprocessed to handle missing data, remove redundancies and integrate data from different sources into a unified schema before being committed to storage. Preparation is the manipulation of data into a form suitable for further analysis and processing. Raw data cannot be processed and must be checked for accuracy. Preparation is about constructing a dataset from one or more data sources to be used for further exploration and processing. Analysing data that has not been carefully screened for problems can produce highly misleading results that are heavily dependent on the quality of data prepared. This preprocessing is a known procedure in data mining called data cleaning. Schema integration does not imply brute-force fitting of all the data into a fixed relational (tables) schema, but rather a more abstract definition of a consistent way to access the data without having to customise access for each source's data format(s). Probabilities at different levels in the schema may be added at this phase to the IoT data items in order to handle the uncertainty that may be present in data or to deal with the lack of trust that may exist in data sources.
-  - **Storage/update and archiving**: This phase handles the efficient storage and organisation of data, as well as the continuous update of data with new information as it becomes available. Archiving refers to the offline long-term storage of data that is not immediately needed for the system's ongoing operations. The importance of this step is that it allows quick access and retrieval of the processed information, allowing it to be passed on to the next stage directly when needed. The core of centralised storage is the deployment of storage structures that adapt to the various data types and the frequency of data capture. Relational database management systems are a popular choice that involves the organisation of data into a table schema with predefined interrelationships and metadata for efficient retrieval at later stages. NoSQL key-value stores are gaining popularity as storage technologies for their support of Big Data storage with no reliance on a relational schema or strong consistency requirements typical of relational database systems. Storage can also be decentralised for autonomous IoT systems, where data is kept at the objects that generate it and is not sent up the system. However, due to the limited capabilities of such objects, storage capacity remains limited in comparison to the centralised storage model.
+  - **Storage/update and archiving**: This phase handles the efficient storage and organisation of data, as well as the continuous update of data with new information as it becomes available. Archiving refers to the offline long-term storage of data that is not immediately needed for the system's ongoing operations. The importance of this step is that it allows quick access and retrieval of the processed information, allowing it to be passed on to the next stage directly when needed. The core of centralised storage is the deployment of storage structures that adapt to the various data types and the frequency of data capture. Relational database management systems are a popular choice that involves the organisation of data into a table schema with predefined interrelationships and metadata for efficient retrieval at later stages. NoSQL key-value stores are gaining popularity as storage technologies for their support of big data storage with no reliance on a relational schema or strong consistency requirements typical of relational database systems. Storage can also be decentralised for autonomous IoT systems, where data is kept at the objects that generate it and is not sent up the system. However, due to the limited capabilities of such objects, storage capacity remains limited in comparison to the centralised storage model.
   - **Processing/analysis**: This phase involves the ongoing retrieval and analysis operations performed and stored and archived data in order to gain insights into historical data and predict future trends, or to detect abnormalities in the data that may trigger further investigation or action. Task-specific preprocessing may be required to filter and clean data before meaningful operations can take place. When an IoT subsystem is autonomous and does not require permanent storage of its data, but rather keeps the processing and storage in the network, then in-network processing may be performed in response to real-time or localised queries.
   - **Output and interpretation**: This is the stage where processed information is now transmitted to the user. An output is presented to users in various visual formats like diagrams, infographics, printed report, audio, video, etc. The output needs to be interpreted so that it can provide meaningful information that will guide future decisions of the company.

en/iot-open/data/data_lifecycle.1558812592.txt.gz · Last modified: 2020/07/20 09:00 (external edit)