Online Learning Messaging

Personalized text messages at scale

Challenge

Investigate new ways to ingest large amounts of data in a performant and monitorable manner.

Read more

Solution

Stateful stream processing

Read more

Result

Nearly real-time processing in most of the cases.

Read more

Challenge

SignalVine is a two-way text messaging platform that makes it easy for students and instructors to communicate concerning appointments, deadlines, and engage the campus community. Signal Vines messaging is not only a communicator. It’s a data-driven approach to messaging. Based on data from clients’ CRM or SIS system, SignalVine can target the right students with the right personalized text messages at scale. 

 

With the growing need for delivering the best quality to their clients, the company wanted to investigate new ways to ingest large amounts of data in a performant and monitorable manner. 

 

As SignalVine is a messaging service, it needs to process a lot of data even before serving it back to users, for which there was a home-grown data processing pipeline. Our responsibility was to research the matter and find a top-notch solution for the pipeline component in order to keep up with the constantly growing user database. There were also a few future-proofing upgrades to be considered in the area of error handling (in comparison to the existing solution). 

Solution

We started off with an analysis of the already existing proof of concept implemented using Kafka Streams. The client had chosen this particular framework because of its out-of-box support for stateful stream processing, which seemed like a good remedy for this kind of project. Although it at first appeared to be working pretty well, we decided to still look for alternatives. 

 

The reason behind this was mostly that Kafka Streams might not always be the best fit when it comes to functional programming and Scala in general. So we started looking into the two most prominent streaming libraries in the community: Akka-Streams and fs2. The former is a part of Akka, the well-known actor model framework and the latter is a library built atop a typelevel stack. It looked like these would both better suit our needs, but after giving it some thorough thought we decided to go with the fs2 solution. Our reason for this was mostly that it :

  • is purely functional
  • is fully flexible
  • seamlessly integrates with a typelevel stack, most importantly with the cats-effect module

 

Most of the data came from either Kafka or RabbitMQ, which could be easily handled because of the already existing libraries. Eventually, it was decided that everything would be moved to Kafka, which simplified our work, hence we used the fs2-kafka library for this task. 

SQL based data pipeline

The existing solution was based on a set of SQL queries and predefined stored procedures responsible for the execution of different data aggregations. The main reason why this wouldn’t perform as expected is that each record consumed from a Kafka topic was at first stored in a separate SQL database and later resulted in quite an expensive set of database operations, usually working on a rather big subset of the whole database. 

Stateful streaming 

As mentioned above, before we joined the project, the client had already come up with a pretty straightforward solution – stateful stream processing. When working with data streaming there is usually no intermediate state – meaning that you simply take some data, process it and later store it somewhere or forward it for further processing. In the case of stateful streaming, there’s an additional step which involves an update of the auxiliary state, which is basically a cache for values that are required for the calculation of the output data. This cache needs to be reliable, persistent and fast – which led us to incorporating Redis. Needless to say – the final result of the processing must also be cached since each time it’s updated based on incoming records. On top of that, to ensure data consistency all the records must be processed at exactly the same time and in a specific order. In the case of Kafka, this can be achieved by properly setting the partition keys and later grouping data based on this key, which guarantees the ordering.  To avoid reprocessing of the records, we introduced a metadata containing the most recently processed offset for the given partition/topic which was used to filter out the already processed ones.  

 

Here’s a summary of how our solution worked under the hood:

  1. We grouped data coming from multiple Kafka topics into a single stream
  2. While processing consecutive batches, each batch got grouped by a common id
  3. Each group was then processed in parallel by:
    1. Loading the previous aggregation state from the cache
    2. Filtering out the already processed records based on the metadata
    3. Calculating updates to the state based on the specific records and caches.
    4. Caching and publishing the latest result’s state to the Kafka topic  
  4. Committing the entire batch after it is fully processed 

Result

Results of the delivered POC were exactly as expected. The service performed very well, and we achieved nearly real-time processing in most of the cases. 

 

Our solution achieved the following key benefits:

  • Simplified data processing,
  • Significantly improved performance, as there is now no need to work on a large subset of the data every time
  • Produced consistency of data, thanks to a more properly managed offset processing. 
  • Made the code less error-prone and improved error detection

Let’s talk about your project

Drop us a line

Learn more

Scalac worked with over 80 companies around the world.
Find out more about our consulting and development solutions.
See how our team contributed to customers’ success.

E-learning Platform Scalability Solution

Scalability has become one of the most important issues in Sumdog, as with every fast-developing company, due to the growing number of customers using the software, the evolution of existing features, as well as increasing requirements when it comes to adding new features.

E-learning Platform Optimization

The client expanded the development team in an established technology with a strong team of developers, alleviating the challenges and costs of recruiting at scale in a competitive market in San Francisco.

Fintech Payroll Cloud Solution

How the Scalac team helped Bexio people be more competitive, by developing top-class payroll solution, positively influencing the work culture and the overall performance of the Bexio team.