by Nick Gambino, Software Engineer
As an engineer, system migrations between database services can often be tricky to implement smoothly, and usually come with complex technical challenges. Organizations can sometimes introduce new requirements as they scale, or enter into a new phase of their business lifecycle. It's not uncommon, and we'd like to share some of our insights with the developer community in tackling some of these challenges.
One of our clients, an energy storage provider, tasked us with building the software infrastructure to administer, analyze, and monitor their battery storage units (BSU’s). Since these applications were both for internal and external stakeholders, our job was to create beautiful user experiences that provided responsive and actionable data to the end user. With thousands of these units spread across the country, this was an interesting IoT challenge for us to tackle.
As companies scale, their system architecture needs to scale with them, and sometimes this creates bottlenecks with certain underlying pieces of infrastructure. In this particular case, it was the database service that housed the BSU device data. We noticed that the existing solution had capacity limitations that were beginning to constrain the business’s growth.
More importantly, we realized that our data pipeline was being fed directly into this database, which would then also handle its own scheduled backups. In order to support the needs for further scaling out our services we realized that we needed to re-think the way that this pipeline worked.
The data pipeline is critical for any data-driven application to function effectively. Particularly in a distributed system, we found that we often needed to query specific segments, and sub-segments of data to dynamically render data visualization animations and provide real-time analytics. The problem was that we could not scale the database dynamically, and in order to accommodate additional BSU’s coming online, we would often need to schedule downtime in order to upgrade the database with more storage capacity. This led to additional problems of data loss during these downtimes.
Ultimately, we decided that we needed to migrate to a different data storage service without losing any of this critical data. After discussing with our team, we identified some key takeaways for this new database system:
In the end, we decided on InfluxDB. One of the benefits of InfluxDB is that it is a time-series database with a query language similar to SQL, and also supports continuous queries that make it easier to aggregate data points. In addition, InfluxDB also provides the option for either a self-hosted solution or a managed service. This meant that we could be flexible as our client’s needs changed over time.
For some context, here is what the system infrastructure looked like before we began this migration:
We started out with a single ETL data pipeline that read BSU data from our API endpoint by a “producer” microservice and then dropped this data, in the form of XML payloads onto a RabbitMQ data queue. This data queue was then picked up by another “consumer” microservice that would execute data transformation into JSON before populating the database. Rather than relying on a single server talking to a database, the initial design decision behind this data queue based approach was to ensure that request processing of a growing number of IoT devices would not create a bottleneck on the initial web server. Once the data was populated in the database, it was then configured to provide scheduled snapshot backups to AWS S3.
Up Next: Our Key Takeaway and where we ended up