Changing engines while you're in flight (Part 1)

How a microservice architected data pipeline helped our team smoothly transition between streaming database services (Part 1/2)

‍

by Nick Gambino, Software Engineer

As an engineer, system migrations between database services can often be tricky to implement smoothly, and usually come with complex technical challenges. Organizations can sometimes introduce new requirements as they scale, or enter into a new phase of their business lifecycle. It's not uncommon, and we'd like to share some of our insights with the developer community in tackling some of these challenges.

UI for the streaming data of energy storage devices

One of our clients, an energy storage provider, tasked us with building the software infrastructure to administer, analyze, and monitor their battery storage units (BSU’s). Since these applications were both for internal and external stakeholders, our job was to create beautiful user experiences that provided responsive and actionable data to the end user. With thousands of these units spread across the country, this was an interesting IoT challenge for us to tackle.

The Challenge:

As companies scale, their system architecture needs to scale with them, and sometimes this creates bottlenecks with certain underlying pieces of infrastructure. In this particular case, it was the database service that housed the BSU device data. We noticed that the existing solution had capacity limitations that were beginning to constrain the business’s growth.

More importantly, we realized that our data pipeline was being fed directly into this database, which would then also handle its own scheduled backups. In order to support the needs for further scaling out our services we realized that we needed to re-think the way that this pipeline worked.

The data pipeline is critical for any data-driven application to function effectively. Particularly in a distributed system, we found that we often needed to query specific segments, and sub-segments of data to dynamically render data visualization animations and provide real-time analytics. The problem was that we could not scale the database dynamically, and in order to accommodate additional BSU’s coming online, we would often need to schedule downtime in order to upgrade the database with more storage capacity. This led to additional problems of data loss during these downtimes.

Ultimately, we decided that we needed to migrate to a different data storage service without losing any of this critical data. After discussing with our team, we identified some key takeaways for this new database system:

Must be scalable.
The database should be able to scale as more BSU’s come online, with low risk of downtime and data loss.
Supports basic HTTP(S) API requirements
Distributed services should be able to query the database over HTTP protocol, and get meaningful payloads in return.
Supports optimized data storage and basic data aggregation
The database should provide performant queries across multiple data sets, of various sizes and structures as needed
Our initial ETL implementation converted XML into JSON before populating the database, so the new solution should ideally have JSON read/write capabilities.
Ideally have hosted solutions
Hosted services would help us to build solutions for our client that would require minimal operational overhead.
Can perform data backfills on request
In the case of data loss, the system would need to populate missing data points from backup storage in a way that wouldn’t hinder the performance of the system’s web/mobile applications.

In the end, we decided on InfluxDB. One of the benefits of InfluxDB is that it is a time-series database with a query language similar to SQL, and also supports continuous queries that make it easier to aggregate data points. In addition, InfluxDB also provides the option for either a self-hosted solution or a managed service. This meant that we could be flexible as our client’s needs changed over time.

For some context, here is what the system infrastructure looked like before we began this migration:

‍

We started out with a single ETL data pipeline that read BSU data from our API endpoint by a “producer” microservice and then dropped this data, in the form of XML payloads onto a RabbitMQ data queue. This data queue was then picked up by another “consumer” microservice that would execute data transformation into JSON before populating the database. Rather than relying on a single server talking to a database, the initial design decision behind this data queue based approach was to ensure that request processing of a growing number of IoT devices would not create a bottleneck on the initial web server. Once the data was populated in the database, it was then configured to provide scheduled snapshot backups to AWS S3.

‍

Up Next: Our Key Takeaway and where we ended up
‍

‍