Modern Data Stack vs. Modern Edge Data Stack: What You Need to Know about Edge Data

What is a Modern Data Stack (MDS)?

Modern data stack (MDS) is an architecture framework that enables an organization to analyze massive amounts of data and derive business intelligence. It is revolutionizing enterprise operations because it brings data from siloed databases and applications into a central data warehouse so that data can be accessed across the organization.

What are the components of a Modern Data Stack (MDS)?

  • Data sources are either SQL databases or SaaS software that can provide structured, transactional data.

  • Data pipeline is the process of extracting data from the data sources, transforming them into the destination data structure, and loading them into the destination data store. This process is called ETL, but with modern data warehouses and lakes, source data can be directly saved into the data store and transformation occurs at the time of data analysis. This process  is called ELT.

  • Data Warehouse is the central location where all enterprise data is stored, removing silos from having multiple disparate databases across the organization.

  • Analysis is the process of loading data from the data warehouse into a visualization software such as BI or an analytics engine such as AI.  

Modern Data Stack Data Analysis

Data analysis in a modern data stack is typically statistical in nature, and relevant data from the data warehouse is collected through database query. For example, Starbucks could store massive amounts of transaction data through their mobile app and in-store experiences. Then, they can perform data analysis such as identifying "the top 3 most ordered drinks for males, age 25-35, in New England" to derive business intelligence on how to market to this specific group of consumers.

What is a Modern Edge Data Stack (MEDS)?

Unlike a typical modern data stack, typically, edge data is not structured and transactional data. It is more often unstructured, uncontextualized, unreliable, and event-based.

What are the characteristics of edge data?

  • Unstructured - raw edge data comes in as a number or a set of numbers. Relevant metadata needs to be added to make it structured.

  • Uncontextualized - edge data is uncontextualized, and so context needs to be added to the data. 

  • Unreliable  - edge data can be unreliable. For example, sensors could become uncalibrated, or connectivity could be down such that data cannot arrive at destination.

  • Event-based - In enterprise data, you are looking for statistical relationships within a data set; in edge data, you are looking for events hidden inside each data source. Sometimes those events are rare. For example, you may be looking for the event when the temperature of the machine exceeds 85°C. It may be possible that the temperature stays under 85°C for months before this event happens.

What are the components of a Modern Edge Data Stack (MEDS)?

For these reasons, edge data requires a more complex data pipeline. The Modern Edge Data Stack (MEDS) is the MDS framework that supports edge data. MEDS includes the following components:

  • Data sources - edge data sources includes not only databases and SAAS software, but also sensors, machines, files, and etc.

  • Edge hardware - edge data sources may require edge computing, which is the step of processing edge data locally close to the sources.

  • Acquisition - in MDS, getting the data is called “extraction”, but in MEDS, this step is called “acquisition”. This is because getting edge data can require software drivers to access physical data sources such as sensors and machines.  There are also many different protocols that exist to acquire edge data.

  • Cleansing - Cleansing is the step to ensure data quality and sometimes also remove redundant data. For example, temperature data may not be reported until it goes above 65°C.

  • Transformation - Metadata is added and different data formats are converted to a common data schema.

  • Contextualization - context data is added to aid data engineers or scientists to work with the data in later stages.

  • Warehouse - centralized location where data is stored, but depending on the data sources, a time-series data warehouse may be more suitable than a SQL data warehouse.

  • Analysis - In addition to visualization and analytics, edge data can also be used to control edge equipment or trigger actions.

When enterprises use the MEDS framework, it produces highly-quality, consistent edge data that maximizes data value creation in the Modern Data Stack.

Previous
Previous

Node-RED vs. Prescient Designer: What's the difference?

Next
Next

Customize your edge AI solutions with low-code workflows