SignalVine is a two-way text messaging platform that makes it easy for students and instructors to communicate concerning appointments, deadlines, and engage the campus community. Signal Vines messaging is not only a communicator. It’s a data-driven approach to messaging. Based on data from clients’ CRM or SIS system, SignalVine can target the right students with the right personalized text messages at scale.
With the growing need for delivering the best quality to their clients, the company wanted to investigate new ways to ingest large amounts of data in a performant and monitorable manner.
As SignalVine is a messaging service, it needs to process a lot of data even before serving it back to users, for which there was a home-grown data processing pipeline. Our responsibility was to research the matter and find a top-notch solution for the pipeline component in order to keep up with the constantly growing user database. There were also a few future-proofing upgrades to be considered in the area of error handling (in comparison to the existing solution).
We started off with an analysis of the already existing proof of concept implemented using Kafka Streams. The client had chosen this particular framework because of its out-of-box support for stateful stream processing, which seemed like a good remedy for this kind of project. Although it at first appeared to be working pretty well, we decided to still look for alternatives.
The reason behind this was mostly that Kafka Streams might not always be the best fit when it comes to functional programming and Scala in general. So we started looking into the two most prominent streaming libraries in the community: Akka-Streams and fs2. The former is a part of Akka, the well-known actor model framework and the latter is a library built atop a typelevel stack. It looked like these would both better suit our needs, but after giving it some thorough thought we decided to go with the fs2 solution. Our reason for this was mostly that it :
Most of the data came from either Kafka or RabbitMQ, which could be easily handled because of the already existing libraries. Eventually, it was decided that everything would be moved to Kafka, which simplified our work, hence we used the fs2-kafka library for this task.
The existing solution was based on a set of SQL queries and predefined stored procedures responsible for the execution of different data aggregations. The main reason why this wouldn’t perform as expected is that each record consumed from a Kafka topic was at first stored in a separate SQL database and later resulted in quite an expensive set of database operations, usually working on a rather big subset of the whole database.
As mentioned above, before we joined the project, the client had already come up with a pretty straightforward solution – stateful stream processing. When working with data streaming there is usually no intermediate state – meaning that you simply take some data, process it and later store it somewhere or forward it for further processing. In the case of stateful streaming, there’s an additional step which involves an update of the auxiliary state, which is basically a cache for values that are required for the calculation of the output data. This cache needs to be reliable, persistent and fast – which led us to incorporating Redis. Needless to say – the final result of the processing must also be cached since each time it’s updated based on incoming records. On top of that, to ensure data consistency all the records must be processed at exactly the same time and in a specific order. In the case of Kafka, this can be achieved by properly setting the partition keys and later grouping data based on this key, which guarantees the ordering. To avoid reprocessing of the records, we introduced a metadata containing the most recently processed offset for the given partition/topic which was used to filter out the already processed ones.
Here’s a summary of how our solution worked under the hood:
Results of the delivered POC were exactly as expected. The service performed very well, and we achieved nearly real-time processing in most of the cases.
Our solution achieved the following key benefits:
Scalac worked with over 80 companies around the world.
Find out more about our consulting and development solutions.
See how our team contributed to customers’ success.