Picture: Microsoft

Store Messages of Azure IoT Hub to Azure Blob Storage

In most cases, Azure IoT Hub messages should be stored for a longer period of time.

In fact, there are several ways to achieve this goal - and there are huge differences in performance and price.

Azure IoT Hub - Storage Endpoint

The way messages are processed in Azure IoT Hub are endpoints: these endpoints are usually other Azure products or services. By default, the Azure IoT Hub supports storage routing either towards Azure Blob Storage or into Azure Data Lake Storage.
Azure Blob Storage is an extremely cost-effective product in this respect, as it is only a hard disk storage space, whereas Data Lake is a separate Azure service.

In the case of Azure Blob Storage, the data is stored in the so-called AVRO format by. Apache AVRO is a very common data format in data analysis and has its origin in the Apache Hadoop project.
If you do not want AVRO as format, you can alternatively save it in JSON format.

The disadvantage of direct storage is that it causes additional costs - per endpoint. The format and structure of how data is stored is also fixed or very limited.
I basically recommend this variant only for proof-of-concepts or very small, simple projects.

Azure IoT Hub - Azure Function Endpoint

Some code is required if you want to have the saving process in your own hands: and write a small Azure Function accordingly.

This function has a trigger for new IoT hub messages and the trigger stores the data in an Azure Blob Storage Container via the Azure Storage SDK.

The disadvantage is the code that you have to program yourself - but especially for smaller scenarios with less messages you usually have a cheaper setup, because the costs for direct storage are omitted or the Azure Function simply costs less.

This scenario also supports the Azure IoT Hub in the free version and you may even get out of the whole thing for free, even if the message processing by Azure Function remains below the chargeable limit.

Azure IoT Hub - Event Grid as proxy

The Azure IoT Hub has certain limits on the number of endpoints and generates additional costs when messages are stored directly in the storage. Furthermore, this scenario is not considered as "high-performance".

The alternative for direct storage by the IoT Hub is the Event Grid.

Azure Event Grid is a service in Azure that has been trimmed for high-performance and can also tap the Azure IoT Hub as an event source.

The idea here is that the Event Grid is no longer the IoT Hub but the central point for all event processing. The advantages are:

  • Higher overall performance
  • More endpoint capabilities
  • More source possibilities; also HTTP and Azure Functions
  • The free version of the IoT Hub can also be used as a source

Especially the fact that other sources are possible in the Event Grid is a very big plus.

In almost all my IoT projects I have sensor sources, which unfortunately are not provided by the IoT Hub but by external webhooks. This is a particularly common scenario in home automation and sensor data coming from external service providers.

The Event Grid is the only product that offers me this flexibility: many different sources, many different destinations, and very high performance at low, scalable costs.

Unfortunately, the messages coming from the IoT Hub to the Event Grid cannot be stored directly in the storage. For this I usually use an Azure Function, which is called by the Event Grid (better said the Azure Function has a trigger on an Event Grid event).

The advantage, however, is that I have control over the structure and format in which I store my data.

Hot Path vs Cold Path

Storage from the Azure IoT Hub directly into the blob storage is a so-called cold path: data is stored directly into a cold storage - the blob storage.

Routing, for example, directly into a streaming service that offers quasi real-time message processing - would be a hot path.