3 Options: Scalable Real-Time IoT Data Monitoring and Reporting

TECHNOLOGY

Muhammad Ali Amir Khan

1/15/20255 min read

Real-Time IoT Data Monitoring and Reporting for a Smart City Project

Scenario: Techsus is a Smart City project involving deploying IoT sensors across various locations to monitor air quality, temperature, humidity, and noise levels in real time. The goal is to store-in that data for analysis, and send hourly summary reports to Meteorologists & Atmospheric Data Analysts.

NOTE: All of the necessary components/services are hyperlinked to their respective official articles for deep-dive.

AWS based Solution Design

Architecture Overview

  1. Data Ingestion: IoT Sensors → IoT Core → Kinesis Data Stream

  2. Data Processing: Kinesis Data Stream → Lambda → DynamoDB

  3. Hourly Aggregation & Reporting: EventBridge → Lambda → DynamoDB → SNS → SQS → Consumer

Implementation Steps

Step 1: Setting Up IoT Data Collection

  1. Deploy IoT Sensors: Sensors need to be installed at the necessary city locations to capture real-time environmental data. These devices send data to AWS IoT Core via MQTT protocol.

  2. Stream Data to Kinesis: Use an IoT Rule in AWS IoT Core to forward incoming data to an IoT rule “Amazon Kinesis Data Streams” action for real-time processing.

Step 2: Processing IoT Data with AWS Kinesis

  1. Create a Kinesis Data Stream: Configure an Amazon Kinesis Data Stream to handle the incoming high-volume IoT data. Specify an appropriate number of shards based on the expected data throughput which may depend upon the number of IoT Sensors.

  2. Process Data with AWS Lambda: Attach an AWS Lambda function to the Kinesis Data Stream to process the incoming data; sample JSON payload:

{

"sensorId": "sensor_1",

"temperature": 22.5,

"humidity": 60,

"noiseLevel": 35.2, // Noise level in decibels (dB)

"airQualityIndex": 42, // Air Quality Index (AQI) value

"timestamp": "2024-01-01T12:00:00Z"

}

The Lambda function:

  • Parses the sensor data.

  • Validates(like checking for missing fields) and/or Transforms the data.

  • Writes processed data to the database.

Step 3: Storing Data in a Scalable Database

  1. Choose the Database: Use Amazon DynamoDB to store real-time IoT data since it is scalable and has capacity to handle high write-throughput.

  2. Database Schema:

  • Partition Key: SensorID

  • Sort Key: Timestamp

  • Additional attributes: Temperature, Humidity, NoiseLevel, AirQualityIndex.

Section “Primary key” in article details about Partition and Sort Keys.

Step 4: Generating Hourly Summaries

  1. Set Up an Aggregation Lambda:

  • Create a second AWS Lambda function triggered by an Amazon EventBridge (CloudWatch) rule to run hourly.

  • The function queries DynamoDB for the last hour's data, calculates averages, and formats a summary report.

  1. Publish Summary via SNS: The Lambda function publishes the hourly summary to an Amazon SNS Topic.

Sample SNS Message:

{

"cityZone": "Downtown",

"averageTemperature": 27.5,

"averageHumidity": 60.2,

"averageNoiseLevel": 55.0,

"airQualityIndex": 42

}

Step 5: Sending Notifications to SQS

  1. Subscribe SQS to SNS:

  • Create an Amazon SQS queue and subscribe to the SNS Topic.

  • Any SNS message (hourly summary) will automatically be sent to the SQS queue.

  1. Consumer for SQS Messages:

  • Develop a consumer application or AWS Lambda function to periodically poll the SQS queue.

  • The consumer processes and forwards the summary reports to the appropriate recipients (e.g., email, dashboard, or monitoring tools).

Benefits of This Approach

  1. Scalability: Kinesis and DynamoDB handle high data volumes seamlessly.

  2. Low Latency: Real-time data ingestion and near real-time reporting.

  3. Resilience: SQS ensures reliable message delivery.

  4. Flexibility: The architecture can scale to accommodate additional use cases like anomaly detection or real-time alerts.

Open-source based Solution Design

Architecture Overview

  1. Data Ingestion:IoT Sensors → Mosquitto MQTT → Kafka

  1. Data Processing: Kafka → Apache Flink → Cassandra

  1. Hourly Aggregation & Reporting: Kafka → Spark → Kafka (Summary Topic) → RabbitMQ → Consumer Application

Implementation Steps

Step 1: Setting Up IoT Data Collection

  1. Deploy IoT Sensors: Sensors are installed across the city to capture environmental data. The devices send data to a message broker via MQTT.

  2. Use MQTT Broker: Configure and deploy Eclipse Mosquitto as the MQTT broker to which devices publish messages to its topics.

Step 2: Processing IoT Data with a Stream Processing Framework

  1. Stream Data to Apache Kafka:

  • Use Apache Kafka as the messaging system for handling the high-volume IoT data stream.

  • Sensors send their data to Kafka topics via a bridge from Mosquitto MQTT to Kafka Connect (MQTT Source Connector) which pulls messages from the MQTT broker and publishes them to a Kafka topic (acts as the intermediate buffer for messages).

  1. Process Data Using Apache Flink:

Step 3: Storing Data in a Scalable Database

  1. Choose the Database: Use Apache Cassandra for its scalability and high write-throughput capabilities, well-suited for time-series IoT data.

  2. Database Schema:

  • Primary Key: (SensorID, Timestamp)

  • Columns: Temperature, Humidity, NoiseLevel, AirQuality.

Step 4: Generating Hourly Summaries

  1. Set Up an Aggregation Job: Use Apache Spark to calculate hourly averages for metrics like AvgTemperature, AvgHumidity, etc. using hourly scheduled batch jobs.

  2. Write the Summaries to Kafka: Flink or Spark jobs write hourly summaries to a dedicated Kafka topic (say: HourlySummary).

Step 5: Sending Notifications via a Messaging System

  1. Set Up Apache Kafka Consumers: Deploy a Kafka consumer application (e.g., using Python's “confluent-kafka” library or Java's Kafka client) to consume messages from the HourlySummary topic. This consumer prepares the hourly summary and emails it to the required recipients.

  2. Send Summary Reports via RabbitMQ: Use RabbitMQ as the message broker for delivering summary reports. The Kafka consumer forwards hourly summaries to RabbitMQ, which ensures reliable message delivery.

  3. Consumer for RabbitMQ Messages: Create a consumer application (e.g., a Python script using the pika library) to process RabbitMQ messages i.e. email summaries to the required audience.

Benefits of This Approach

  1. Open-Source Ecosystem: No reliance on proprietary services, reducing costs and increasing flexibility.

  2. Scalability: Kafka and Cassandra handle high-throughput data efficiently.

  3. Real-Time and Batch: Flink supports real-time processing, while Spark is suitable for scheduled batch jobs.

  4. Interoperability: RabbitMQ provides reliable messaging and can integrate with various consumer tools.

Azure based Solution Design

Architecture Overview

  1. IoT Ingestion: IoT Hub → Event Hubs → Cosmos DB

  2. Data Processing: Stream Analytics → Cosmos DB

  3. Hourly Aggregation & Reporting: Timer-Triggered Azure Function → Cosmos DB → Email Notifications

Implementation Steps

1. IoT Data Collection

  • Set Up Azure IoT Hub (an entry point for IoT Telemetrics) with Device Provisioning Service (DPS) to register source IoT devices to simplify and automate device registration & provisioning at scale.

  • Connect IoT Hub to Event Hubs (acts as a buffer and route IoT messages to downstream systems).

2. Data Processing and Storage

  • Stream Analytics: Process data from Event Hubs and write to Cosmos DB.

  • Storage: Use Cosmos DB for scalable, distributed storage.

3. Aggregate Data Hourly: Schedule an hourly Timer-Triggered Azure Function to store hourly aggregated telemetrics into Cosmos DB.

4. Notifications: Using Azure Cosmos DB extension in Azure Function to hook up with Cosmos DB trigger to send email notification of those generated aggregated telemetrics by injecting into an email template then sending the generated email content via Office 365 Outlook connector.

Benefits of This Approach

  • Scalability: Azure services like IoT Hub, Event Hubs & Cosmos DB are designed for massive scale.

  • Cost Efficiency: Pay-as-you-go pricing for compute and storage.

  • Real-Time and Batch: Handles both real-time telemetry ingestion and hourly batch processing.

  • Flexible Notifications: Email notifications can be extended to include SMS or webhook alerts.

Conclusion

Each solution offers unique benefits tailored to specific technical and operational needs:

  • AWS: Ideal for seamless integration with AWS services and rapid scaling.

  • Open-Source: Suited for cost-conscious projects with flexibility requirements.

  • Azure: Best for enterprises already invested in the Azure ecosystem.