top of page

IoT Architecture: How to Build Scalable SDKs for Global Products?

  • Writer: Srihari Maddula
    Srihari Maddula
  • 2 hours ago
  • 6 min read

Author: Srihari Maddula • Technical Lead, EurthTech

Reading Time: 25 mins

Topic: IoT Architecture & Scalability


Bridging the gap between academic projects and industry reality.

Global IoT Network

The Hook: Why Your MVP SDK Won't Survive Production

Building an IoT device that connects to the cloud and sends telemetry is easy. Doing it for one device is a weekend project. Doing it for a hundred devices is a manageable pilot. But what happens when you need to deploy 1,000,000 devices across 50 countries, spanning three major cloud providers, while complying with regional data sovereignty laws?

Most companies hit a massive wall when transitioning from prototype to global scale. Their tightly coupled firmware—where the MQTT client, business logic, and hardware drivers are baked into a single monolithic spaghetti code—collapses under the weight of real-world requirements.

Building a truly scalable global product demands an architectural paradigm shift. You must move away from "firmware" and start thinking in terms of a Modular IoT SDK. In this deep dive, we'll explore the architecture required to build robust, scalable SDKs that can handle millions of globally distributed devices without breaking a sweat.

The Fleet Mentality: Beyond the Single Device

When designing an IoT SDK, the biggest mistake architects make is focusing on the individual device. A global product is never about the single device; it's about the fleet.

Designing for a fleet of 1M+ devices changes every technical assumption you have:

  • State Management: Devices will disconnect, reconnect, lose power, and experience brownouts. Your SDK must handle state reconciliation gracefully.

  • Thundering Herds: If all 1M devices reconnect simultaneously after a cloud outage, they will DDoS your infrastructure.

  • Rollouts: You cannot update 1M devices at once. You need phased rollouts, canary deployments, and automated rollback capabilities.

SENIOR SECRET

Jittering OTA Triggers & Reconnects: Never use fixed intervals for anything at scale. If your devices retry connections or poll for Over-the-Air (OTA) updates every 60 seconds, a network blip will cause a massive synchronization event, crushing your backend. Always apply exponential backoff with randomized jitter. Example: `retry_time = base_time * (2 ^ attempt) + random(0, 1000) ms`.

To manage fleets at this magnitude, your SDK must treat connectivity as a privilege, not a guarantee. It must queue data locally, prioritize critical alerts, and bulk-upload historical telemetry only when the connection is stable and bandwidth is cheap.


Server racks and network cables

Layered SDK Architecture: Separation of Concerns

A scalable IoT SDK must be ruthlessly modular. Hardware changes, cloud providers change, and security protocols evolve. If your business logic is intertwined with your hardware abstraction, you are setting yourself up for a complete rewrite.

A robust SDK architecture is divided into four distinct layers:

1. Hardware Abstraction Layer (HAL)

The HAL isolates the rest of the SDK from the specific microcontroller (e.g., STM32, ESP32, NXP). It provides standardized interfaces for GPIO, I2C, SPI, UART, and flash memory access. When the supply chain forces you to switch MCUs, you only rewrite the HAL.

2. Connectivity & Transport Layer

This layer abstracts the modem (Cellular, Wi-Fi, LoRaWAN) and the transport protocol (MQTT, CoAP, HTTP). The application layer shouldn't know if it's sending data over a Quectel LTE modem or a Broadcom Wi-Fi chip.

SENIOR SECRET

Agnostic Connectivity Wrappers: Build an abstract `Network_Send()` interface. When cellular drops but Wi-Fi is available, the transport layer should seamlessly swap physical interfaces, temporarily queue data, and resume transmission without the application layer ever dropping a state.

3. Security & Cryptography Layer

This layer handles TLS/mTLS, cryptographic offloading to a Secure Element (SE) or Hardware Security Module (HSM), and certificate lifecycle management.

4. Application & Business Logic Layer

This is where your actual product lives. It handles sensor data aggregation, edge computing, state machines, and payload formatting (e.g., Protobuf, CBOR).

Global Scaling Challenges: Roaming, Clouds, and Sovereignty

Going global introduces brutal complexities that local deployments never face.

Multi-Cloud Support

Enterprise customers may demand that their data route to their specific cloud ecosystem (AWS IoT Core, Azure IoT Hub, Google Cloud IoT). Your SDK must support dynamic endpoint provisioning. Hardcoding cloud-specific SDKs into your firmware limits your market reach.

Carrier Roaming (eSIM/iSIM)

A device manufactured in China, shipped to the US, and deployed in rural Brazil will face massive roaming challenges. Traditional SIM cards lock you into unfavorable roaming agreements. Modern SDKs must integrate with eSIM (eUICC) or iSIM architectures to dynamically download local carrier profiles, reducing latency and data costs.

Data Sovereignty & GDPR

You cannot pipe all telemetry to a single `us-east-1` server anymore. European devices must route to EU servers. Your SDK’s provisioning mechanism must be smart enough to recognize its geographical location (via IP geolocation or cellular tower IDs) and negotiate a connection with the appropriate regional endpoint.

Security-by-Design: Identity at Scale

Security cannot be an afterthought bolt-on; it is the foundation of the SDK. Pre-shared keys (PSKs) are unacceptable for global fleets.

Mutual TLS (mTLS) & Secure Elements

Every device must possess a unique, cryptographically secure identity. mTLS ensures that not only does the device verify the server, but the server verifies the device. To prevent private key extraction, the SDK must interface with a hardware Secure Element (like the Microchip ATECC608 or NXP SE050). The private key is generated inside the chip and can never be read; the SDK simply passes the payload to the SE for signing.

Automated Certificate Rotation

Certificates expire. If a device has a 5-year lifespan but its certificate expires in 2 years, it will be permanently bricked if it cannot renew its identity. Your SDK must implement an automated certificate signing request (CSR) rotation mechanism.

SENIOR SECRET

Automated Certificate Check & Fallback: Always maintain two trust stores. If an OTA update pushes a revoked root CA or an invalid certificate, the device must detect the TLS handshake failure, roll back to the fallback identity, and phone home to a quarantine endpoint for remediation.

Production-Ready C Snippet: Automated mbedTLS Certificate Check (STM32)

Here is a robust example of how to configure mbedTLS to perform strict certificate validation, specifically checking for expiration and valid trust chains before allowing an MQTT connection.

#include "mbedtls/ssl.h"
#include "mbedtls/entropy.h"
#include "mbedtls/ctr_drbg.h"
#include "mbedtls/x509_crt.h"
#include "mbedtls/error.h"
// Contexts
mbedtls_ssl_context ssl;
mbedtls_ssl_config conf;
mbedtls_x509_crt cacert;
mbedtls_x509_crt clicert;
mbedtls_pk_context pkey;
/

@brief Configures TLS for mutual authentication

*/
int secure_transport_init(const char root_ca, const char client_cert, const char* client_key) {
int ret;
mbedtls_ssl_init(&ssl);
mbedtls_ssl_config_init(&conf);
mbedtls_x509_crt_init(&cacert);
mbedtls_x509_crt_init(&clicert);
mbedtls_pk_init(&pkey);
// 1. Load Root CA
ret = mbedtls_x509_crt_parse(&cacert, (const unsigned char *)root_ca, strlen(root_ca) + 1);
if(ret < 0) return ret;
// 2. Load Device Certificate
ret = mbedtls_x509_crt_parse(&clicert, (const unsigned char *)client_cert, strlen(client_cert) + 1);
if(ret < 0) return ret;
// 3. Load Device Private Key
ret = mbedtls_pk_parse_key(&pkey, (const unsigned char *)client_key, strlen(client_key) + 1, NULL, 0, NULL, NULL);
if(ret < 0) return ret;
// 4. Configure SSL Profile (Strict Validation)
mbedtls_ssl_config_defaults(&conf, MBEDTLS_SSL_IS_CLIENT, MBEDTLS_SSL_TRANSPORT_STREAM, MBEDTLS_SSL_PRESET_DEFAULT);
mbedtls_ssl_conf_authmode(&conf, MBEDTLS_SSL_VERIFY_REQUIRED); // Force validation
mbedtls_ssl_conf_ca_chain(&conf, &cacert, NULL);
mbedtls_ssl_conf_own_cert(&conf, &clicert, &pkey);
// 5. Setup context
if((ret = mbedtls_ssl_setup(&ssl, &conf)) != 0) {
return ret;
}
return 0; // Ready for handshake
}

Fleet Management & Observability

You cannot fix what you cannot see. When a device goes offline in a remote desert, you need to know why without sending a technician.

Remote Logging and Telemetry

`printf()` is dead. Scalable SDKs use structured, leveled logging (Trace, Debug, Info, Warn, Error). Logs must be buffered in non-volatile memory and uploaded only when critical or requested by the cloud.

SENIOR SECRET

Ephemeral Logging via Delta-Patching logic: At scale, logging everything costs millions in cellular data. Implement "Log Level Shadowing." By default, devices only log `FATAL` errors. If a specific device acts up, the cloud issues a twin-update command to dynamically shift its log level to `TRACE` for 10 minutes, streams the debug data, and auto-reverts to `FATAL`.

Crash-Dump Analysis

When a HardFault occurs on an ARM Cortex-M processor, the device typically reboots, erasing the context. A mature SDK intercepts the HardFault, writes the CPU registers, call stack, and heap state to a reserved sector of external flash memory, and reboots. Upon reconnecting, it uploads this binary crash-dump to the cloud, allowing developers to run `gdb` against the dump and pinpoint the exact line of code that caused the crash.

Summary

Building a scalable IoT SDK is an exercise in defensive engineering. By shifting your mindset from single-device firmware to a modular, decoupled SDK architecture, you future-proof your product.

Embrace the fleet mentality: utilize jittered reconnects, implement strict hardware abstraction, enforce security via mTLS and Secure Elements, and build robust observability pipelines to catch failures before they become catastrophes.

The initial engineering overhead is high, but when your product scales past 1,000,000 active devices, a well-architected SDK is the only thing standing between a seamless global operation and an unmitigated disaster.

© 2026 EurthTech. Built for the next generation of engineers.

 
 
 

Comments


EurthTech delivers AI-powered embedded systems, IoT product engineering, and smart infrastructure solutions to transform cities, enterprises, and industries with innovation and precision.

Factory:

Plot No: 41,
ALEAP Industrial Estate, Suramapalli,
Vijayawada,

India - 521212.

  • Linkedin
  • Twitter
  • Youtube
  • Facebook
  • Instagram

 

© 2025 by Eurth Techtronics Pvt Ltd.

 

Development Center:

2nd Floor, Krishna towers, 100 Feet Rd, Madhapur, Hyderabad, Telangana 500081

Menu

|

Accesibility Statement

bottom of page