The Data Journey: From Sensor to Business Value

Oliver Schenk
Jan 1, 2024
16 min read

Monitoring plant, assets, infrastructure, public spaces and the environment can be a complicated and technically challenging problem. You might know exactly what data you want and what problem you are trying to solve, but gathering, transmitting, transforming, storing and visualising data, as well as creating alerts based on that data, requires a deep understanding of not just the technical aspects, but also the business and operational aspects.

Data is key to decision making and business intelligence, but getting data right is hard. Data is not something that you can get "out of the box". It must be designed, collected, cleansed, transformed and interpreted correctly based on your specific business context. It must be managed and curated. But most importantly of all, it must help solve a problem.

In this guide we will look at data and its journey in an Internet of Things (IoT) context. We'll break it down into smaller building blocks so that we can clearly see what is involved in each step along the way. The main aim is not to answer all your questions, but give you information to help you to think about what questions still need to be asked.

Sensors

The sensor is what connects the real world to your systems and solutions . It translates something physical or environmental (such as voltage, temperature, pressure, vibration, light, etc...) into a representation that a digital device can measure and understand. This will usually be an electrical signal, such as a voltage or current, but it can also be a protocol such as Modbus.

RAK wireless modules, gateways and enclosures

Choosing the right sensor is key. Let's look at some things to consider when choosing a sensor.

Measurement range

What is the minimum and maximum value of the physical input that you want to measure? Is it a linear measurement range?

Choose a sensor that has a range as close as possible to your desired minimum and maximum values. This ensures the best possible accuracy.

Accuracy

How accurate does the measurement need to be? Does the sensor require calibration? Under what conditions does the sensor have the best accuracy?

Some solutions may require highly accurate sensors, whilst other solutions do not. This is usually dictated by the sensor specification as a percentage value. Some sensors may be sensitive to operating temperature, whilst other sensors may be sensitive to voltage fluctuations or interference.

Think about the factors that might influence accuracy and what you can tolerate as an acceptable range.

Durability

Does the environment into which the sensor is being placed have any extreme conditions? What material should the sensor be made of? Is it an outdoor or indoor rated sensor? What level of water and dust proofing is required? Does it have to be intrinsically safe?

The answers to these kinds of questions will depend heavily on the physical environment in which a sensor is installed. Some environments are harsh or extreme and require special consideration.

Connectivity

The connectivity options that a particular sensor can support varies widely across different manufacturers. This can be as simple as a switch that makes or breaks a circuit, or as complicated as a communications protocol that needs to be handled by the device to which the sensor is connected.

Some common connections and protocols may include:

Dry contact (switch)
Voltage signal
Current signal
I2C (Inter-Integrated-Circuit)
SDI-12 (Serial Data Interface)
SPI (Serial Peripheral Interface)
UART (Universal Asynchronous Reception and Transmission)
Modbus RTU/TCP

As you can see it now starts to get more complicated.

For most industrial applications you will use a voltage or current based sensor with a good balance of accuracy and durability. These sensors will have a stable and regulated power supply and will most likely be connected to an industrial PLC (Programmable Logic Controller) or RTU (Remote Terminal Unit). These types of sensors are less suited to low power applications.

For low power and IoT (Internet of Things) applications, where power conservation is a priority, you will often use low voltage signals and protocols such as I2C and SPI.

Power Requirements

Aside from switches, most sensors will need a power source. This will depend heavily on the type of sensor and the connectivity options it provides. Power input requirements can range from the low-power range, which is generally between 3.3V and 5V, to the more common industrial levels of 12V or 24V.

If mains power, or 12V solar power and batteries are available, then almost any sensor can be powered. For compact IoT applications the most common power source will be a long-life battery, and optionally, a low voltage solar panel.

You'll need to calculate the expected power consumption of your solution and think about whether you need solar power and batteries, and if so, what size and capacity you need.

In low-power solutions, the solar connection and battery charging capability is often in included in the IoT circuit boards. These can use solar modules that operate at 5V and only a few Watts with AA, C or D sized Lithium batteries.

Installation

Where will the sensor be located and how will it be mounted? What cabling does it require? Does the sensor need to be protected from weather, moisture, dust or animals? What access is needed for maintenance and how often?

Installing a sensor can be a challenge, especially in extreme or hazardous environments, or when placed near other infrastructure such as power plants, railways and waterways. The cost can vary a great deal depending on the infrastructure required by the sensor.

Power infrastructure, such as batteries and solar panels, can often significantly increase the overall cost of installing a sensor due to the cost of construction and installation of poles, foundations, enclosures and cabling.

Low-power systems can be used where sensor data is only needed between once a day and a few times a day. Using this approach you can avoid costly infrastructure and thus achieve a very low installation cost.

Gateway

The gateway is a device, or set of devices, that interfaces with the sensor either directly, or through an external converter/isolator, and provides the connectivity to the digital world.

A gateway is a generalised term and represents any type of device that can interface with electrical signals, provides some level of programmability and generally includes certain communication options. In the industrial setting this may be a PLC or RTU connected to a mobile data router. It can also be an all-in-one router that supports a limited set of I/O.

In the world of IoT, this will usually consist of a circuit board with a mixture of capabilities such as battery charging, mobile communications and I/O protocols and connections.

Different angles of a Teltonika RUT956 router

There are a few things to consider here depending on the situation. The main questions to answer here relate to trade-offs between network availability, bandwidth requirements and power consumption.

Are the sensors located close to an existing network connection or you do need to install your own? How much data needs to be transmitted and how often? Do you have mains power available or will you need solar and battery infrastructure?

Let's look at the types of connections that you could use.

Fixed Connection

A fixed connection is a connection method where you directly connect to an existing network using a copper or fibre-optic cable. This allows you to use technologies such as Ethernet or serial using RS232/422/485. This works great if you have the infrastructure available, but this is often not an option.

You will most likely find this type of connection in an indoor environment, at a plant or factory, or where a company has their own backbone fibre network.

Wi-Fi

This option is less common in outdoor and remote areas, but can be useful if you already have Wi-Fi infrastructure and you have indoor sensors or sensors in a near-by area. The benefit is that you will be able to use your own private network and you can leverage fast transfer speeds.

The downsides are that Wi-Fi may not always be reliable, uses a significant amount of power and you will need an industrial router that supports Wi-Fi connectivity.

Mobile (LTE)

Mobile data is one of the most common connectivity options in populated areas and along major infrastructure such as roads and rail. Coverage generally increases over time as more towers are rolled out or upgraded.

This connection method is supported by dozens of different types of mobile data routers, and you can tailor your connection plan to what you need. Bandwidth will vary, but it will most likely be more than enough for sensor data and maybe even for video streaming.

The downside of LTE is that it uses a significant amount of power and is not optimised for low power applications. It also relies on good signal quality.

There are many active developments occurring in the 5G space that promise lower power applications, but these are not yet as mature. The term 5G simply refers to "5th generation" devices, but it doesn't have any particular technical meaning.

Low Power Wide Area Network

This type of connection, also known as LPWAN, consists of different types of technologies that prioritise low power consumption over bandwidth and connection latency. In other words, the connection is designed for small amounts of data and experiences longer delays, but will use a fraction of the power required by technologies such as LTE.

As a trade-off, these protocols are not suited to data streaming or media such as pictures and video.

Sensors that use LPWANs usually spend most of their time sleeping and only wake occasionally. For battery-only operation, the battery life will be highly dependent on how many times per day a device wakes up and how many of those wakeups involve transmitting data. The most optimised sensors will take a measurement and then decide as to whether data needs to be sent at all.

There are generally three types of LPWAN connections:

LoRaWAN
NB-IoT (also known as CAT-NB1)
LTE-M (also known as CAT-M1)
Sigfox

You can find a more detailed side-by-side comparison on this blog post.

LoRaWAN

LoRaWAN requires a centralised gateway. The LoRaWAN compatible sensors need to be installed within about 1-3km of the gateway and will communicate with this gateway. In regional areas you might get up to 20km. The gateway will then connect to a server over a fixed or mobile data connection and pass the messages to the server. The configuration of the gateway and the server can be technically challenging.

The gateway requires a reliable power supply, however the sensors, are very efficient and the LoRaWAN protocol is one of the most power optimised protocols. Each gateway can handle tens to hundreds of sensors depending on factors such as the number of channels and when sensors need to transmit.

This network it best suited where many sensors are installed within a contained area such as a public space, city or industrial facility.

NB-IoT

The benefit of NB-IoT (also known as narrow band IoT) is that the communications infrastructure is provided for you by a network carrier, all you need is the sensor and a SIM card. This makes NB-IoT a good choice for sparsely distributed and remotely located sensors.

This protocol operates more like a mobile data connection in that you will buy a NB-IoT SIM card with an associated data plan. Each data plan will have a set number of tokens of a given data size that you can consume. You'll need to choose this based on the amount and frequency at which you wish to transmit data.

The power consumption of NB-IoT is comparable to LoRaWAN, however it really depends on the use case. There are technologies such as PSM (power-saving mode) and eDRX which the carrier can provide that can significantly increase the battery life. A carrier may or may not offer these kinds of features.

LTE-M

LTE-M is a similar technology to NB-IoT. The main difference is that it has a much lower latency, a higher bandwidth and it supports sensors that are on the move. LTE-M also supports creating secure TCP/TLS connections due to the higher bandwidth that is available.

The main trade-off to be aware of when dealing with LTE-M is that it will use a lot more power if you are using high bandwidth devices.

Sigfox

Sigfox is a global communication network designed for the Internet of Things (IoT) devices. It has similar features to NB-IoT and has relatively long range and is simple to integrate.

It also has some of the same limitations in terms of low bandwidth, low messages size and no two way communications.

Ingestion

In the ingestion phase data makes its way from the remote device to a centralised server or IoT platform. Depending on the type of gateway that is used, data may be transmitted using one of a few common protocols:

MQTT
HTTPS
OPC UA
UDP
CoAP
LoRaWAN backhaul

The choice of protocol will depend largely on what is supported by the gateway device in the field or the sensor that is used and the capabilities of the server or IoT platform.

When using LoRaWAN, the gateway will use a LoRa specific protocol to send the data to a LoRa server. This can be hosted in AWS IoT, on platforms such as The Things Network or using your own LoRa stack. Generally you would make use of an existing service than try to build your own unless you have very specific requirements.

For other non-LoRa LPWAN devices the most common protocols will be UDP, CoAP and MQTT depending on the capabilities and power consumption requirements. MQTT is one of the most widely used protocols, but it uses more power than UDP or CoAP.

Devices that are not constrained by power limitations will most likely use MQTT as this is the default choice for platforms like AWS IoT and Azure IoT Hub.

For legacy systems or traditional SCADA connections, the OPC UA protocol is an option and can be integrated with platforms that support such connectors.

The most flexible method is using HTTPS, but it requires you to write your own message handlers and processing capability at the edge to then be able to ingest the data into a remote server. Many Cloud vendors can provide Software Development Kits (SDKs) to help with data ingestion, but it relies on your device being able to run a supported programming language.

Transformation

Once data has been ingested into a server or IoT platform, it will often need to be transformed into a value that represents a measurement in the physical world. In other words, most sensors will measure voltage or current, but this doesn't represent the real world (unless voltage is what we actually want to measure.)

Mathematical Conversion

Usually what we are interested in is a physical measurement such as temperature, soil moisture, or the vibration of a motor. The most simple type of transformation takes care of translating a sensor reading into a real-world measurement by applying a mathematical formula. This is also known as scaling.

Statistical Data

Another type of transformation is to create additional data from a particular measurement. One example of this is calculating statistical data from a measurement by tracking a minimum, maximum and average value within a given moving time window. Such operations provide additional insight into what a measurement is doing.

Payload Processing

In some cases, the data "package" that arrives from a field device can contain measurements from more than one sensor and more than one measurement for each sensor. This data, also known as the payload, needs to be processed and each sensor measurement needs to be extracted.

Usually this involves knowing the format that a gateway will use when sending data and then looking for the relevant fields. Often the data is represented in formats such as JSON, which most IoT platforms should be able to deal with.

Enrichment

A type of transformation called enrichment is used to make a field measurement more useful or valuable by augmenting it with additional information obtained from other sources.

For example, you might obtain a temperature reading from a site and then add a measurement from a local weather data source. These types of data transformations don't change the original measurement, but they add additional context to the data.

Storage

At this point the data has been received from the field and has been processed into something that can be used to help solve a business problems. The data must now be stored in some type of database so that it can be queried by the software and systems that provide the business value back to the user.

Photo by JOSHUA COLEMAN on Unsplash

A system that provides sensor data within seconds or minutes, and maybe even hours, is what can be called a real-time system or a near real-time system. We care about what happened most recently. We want to make operational decisions using this data. We want to be able to query this data quickly and without much latency. We might even want to feed it into a machine learning algorithm.

At a certain point in time, whether that be hours, days or weeks, the data will no longer be useful for making operational decisions. At this point the purpose of the data changes from operations to analytics. This is where we take learnings from the data and use it to make predictions about the future or review past performance and trends.

These two use cases are often referred to as OLTP (Online Transaction Processing) and OLAP (Online Analytics Processing). Each of these have very different data storage requirements.

Data storage is a very complex topic in its own right. When choosing a suitable data store it will come down to the use case and factors such as storage size and speed, security, availability, durability, redundancy and backups. All of this also needs to be balanced against cost.

Let's focus on the data itself.

Operational Data

The data store where the ingested and transformed data is stored is usually a time-series database. This is because the data is ordered and optimised for queries based on time. In more traditional process control language this would be called a Historian.

Time-series databases are specialised for handling time-stamped or time-series data - data points indexed in time order. Here are some general attributes of time-series databases:

Time-Stamped Data: The primary feature of a time-series database is its ability to handle data that is time-stamped. Each data point is associated with a specific point in time.
Efficient Data Ingestion: They are optimised for high-velocity data, meaning they can ingest large volumes of data at a very fast rate, which is essential for real-time monitoring and tracking.
Data Compression: Time-series databases often employ data compression techniques to efficiently store data. Since time-series data can be voluminous, compression helps in reducing storage requirements.
Time-Based Queries: These databases support complex time-based queries, which are crucial for analysing trends, patterns, and anomalies over time.
Scalability: They are designed to scale horizontally, meaning they can handle increased load by adding more nodes to the system, which is important for accommodating growing data volumes.
Data Retention Policies: Time-series databases often include mechanisms for automatic data expiry or down-sampling, helping manage data retention in an efficient way by discarding or aggregating old data.
Real-Time Analytics and Processing: They often provide capabilities for real-time data processing and analytics, enabling immediate insights and actions based on the latest data.
Handling of Time-Series Specific Functions: Functions like windowing, lagging, and leading operations, as well as resampling and interpolation, are commonly supported, allowing for sophisticated time-series analysis.
Data Visualisation and Dashboards: Many time-series databases integrate with, or provide, tools for data visualisation and creating dashboards to represent time-series data graphically.
Anomaly Detection: They often support anomaly detection features, which can identify unusual patterns in data that might indicate critical incidents or system issues.

A time-series database usually sits at the core of an IoT system.

Analytical Data

Data analytics involves the use of techniques and tools to analyse, interpret, and derive insights from data. It goes beyond the raw operational data to discover patterns, trends, correlations, and other meaningful information.

The primary purpose of data analytics is to gain a deeper understanding of business processes, identify opportunities for improvement, make informed decisions, and predict future trends. It adds a layer of intelligence to the raw operational data.

It often involves the processing of large datasets using statistical, mathematical, or computational methods. It requires data storage technologies that can handle complex queries and large data sets.

Analytics data storage is often kept separate from operational data storage to avoid analytics from impacting the performance of operations data. This is because analytics queries can be complex and may put a large load on the system.

Here are some key attributes of databases and data storage solutions used for analytics:

Columnar Storage: Unlike traditional row-based storage, analytics databases often use columnar storage, which enables faster query performance for analytics workloads as it allows for efficient reading of specific columns without accessing the entire row.
Massive Parallel Processing (MPP): These systems often employ MPP architecture, allowing simultaneous data processing across multiple nodes, which significantly speeds up data querying and analysis.
Data Warehousing: Analytics databases are frequently used for data warehousing, where data from different sources is aggregated, transformed, and stored for querying and analysis.
Scalability: They are built to handle large volumes and varieties of data. This includes scaling up to accommodate growing data volumes, both in terms of storage and computational power.
Optimization for Read-Heavy Queries: Analytics databases are optimized for complex, read-heavy queries that are typical in data analysis, including aggregations, joins, and window functions.
Integration with Data Analysis Tools: They typically offer integration with various data analysis, reporting, and visualization tools, allowing for seamless analysis and reporting.
Support for OLAP (Online Analytical Processing): These databases are often optimized for OLAP operations, enabling multi-dimensional analysis of complex data sets.
High Data Throughput: They provide high throughput for both reading and writing large data sets, which is essential for handling big data analytics workloads.
Data Redundancy and Reliability: To ensure data integrity and availability, analytics storage solutions often incorporate redundancy and backup mechanisms.
Advanced Analytics Features: They may include features like predictive analytics, machine learning capabilities, and advanced data modelling tools.

In the world of IoT, sensor data will most of the time be stored in a time-series database in order to provide the operational business value. In later stages it can then be transferred to an analytics type database.

Visualisation and Alerts

Up until this point we've collected physical measurements and digitised them into data. We then transmitted, transformed and stored this data. The last piece of the puzzle is to bring the data back into a physical context so that it can be turned into business value.

Digital Twin

When dealing with sensor data the most important goal is to present the data in a way that makes sense within a particular business context. In other words, you want the data to visually represent a physical thing in a way that makes sense to a specific operational team.

In most cases a remote site will consist of more than just one sensors. It will usually consist of many sensors to allow you to create a representation of the state of a whole system or process. This representation of state can be thought of as a model and is often called a Digital Twin. It alludes to the fact that you can know exactly what a physical system is doing by looking at its data - which is the digital representation.

Visualisation

To leverage digital twins and operational data and make better operational decisions, the data must be presented in a human friendly way. This usually involves the use of visualisations and dashboards.

Visualisations and dashboards are something that should be designed by the operational teams that use them. They are the primary user and are best placed to understand the business context and problems that need to be solved.

In the following picture you can see a visualisation that depicts the last known state and the last 30 minutes of activity. This gives the operational team the information they need about the state and performance of the level crossing without having to physically attend the site.

A Grafana visualisation of level crossing data — Visualisation - Level Crossing Monitoring System

Visualisations are great for presenting data on operational control panels, in control rooms and on personal computers and mobile phones when there is a known issue.

Alerts

In the last section we talked about visualisation, which is great for looking at and querying data in a visually engaging way. Now let's imagine there is an alarm or an issue with a particular sensor. The problem is that unless an operator is always watching the dashboards or each site, they won't know that there is an issue.

This is where alerts are useful. Alerts are a mechanism by which the system automatically monitors the incoming data and checks to make sure it does not exceed a given range. If it detects that the data is not within the specified range it will notify someone.

There are many ways in which systems can send alerts. This can include email, SMS, voice call or some other display or indicator. Alerts are a key part of any data system that takes care of monitoring systems and infrastructure.

Conclusion

Hopefully the data journey has provided some food for thought and appreciation for what is involved in taking sensor measurements all the way through to final business value.

There are many steps involved in the data journey. Take each step in isolation and work through what it achieves. Here are a few more tips:

Optimise what data you send and how often, but don't transform the data too much or too early. You can always do more processing in later stages.
Be smart in how you store data. No need to keep data in expensive storage for too long if no one uses it.
Keep your operational and analytics processes separate.
Ensure the business decides on what visualisations and alerts they need and design them for their own use.

As a final piece of advice, it is important to always start with business value and understand your problem and what it is you are trying to solve. The next step is to understand what data is needed to achieve this business value and to solve your problem. Only then should you look at the software and technology that might be used to create the data journey.

Think BDAT - Business => Data => Applications => Technology