data engineering

Financial Intelligence At Airbnb With Scala – A Case Study

data engineering

Any business that is growing will, at some point, face scalability issues. Not only does that refer to business challenges, but also technical ones. More clients, processes, and therefore data mean a necessity to adjust online products to this greater volume. 

From Scalac’s experience, introducing such crucial technical changes in data engineering requires smart choices and analysis of processes. We should keep in mind that any new options should provide us with scalability, quality, and performance. 

For those three reasons, many known companies opt for Scala programming. This language is dedicated especially to complex workflows. A particularly great example that is worth mentioning is the solution implemented by Airbnb. 

In this article, we’ll show you how they improved tracking money and financial accounting in a scalable way. You’ll get to know why they decided to change their programming language to Scala and what benefits it brought them. 

Payment process overview and development

The payment process at Airbnb involves a massive volume of transactions between guests and hosts. In 2017, they managed transactions in over 190 countries and 70 currencies. 

Over the years they have developed their transaction processes as well, introducing new functionalities, currencies, and payment methods, etc.

Also, the scope of their finance department’s responsibilities has widened. They have had to follow international accounting principles, ensure compliance with taxes and licenses, and provide detailed reporting.

Their year-on-year transaction growth combined with new products and features resulted in a considerable increase in data volume. You can only imagine the data engineering team working out numerous solutions. 

Processing data at some point became overly tricky. Providing necessary reporting to the company’s stakeholders each day was quite a challenge. 

First steps before Scala programming

For a few years, financial accounting at Airbnb was based on the MySQL data pipeline. With a parameterized MySQL ETL they could provide reports overnight.

However, preparing full reports required two methods to track main and additional data. 

The most important data were reservations and payment records. These were changing in real time, and tracking them was crucial from a financial perspective. For this part of reporting they used MySQL data triggers for main tables in reports. 

For the additional data, they built a set of intermediate helper tables. These were usable in different, more precise reports to present the elements of the main tables. By doing so, they were able to track, e.g. expected revenue, future host payouts, guest receivables, and other essential aspects of financial reporting.

Problems with Scaling reports up

Even though the applied solution ensured data accuracy and flexibility in terms of changing business logic, it wasn’t a perfect solution. The growing complexity of payment processes was limiting the scalability of the reports. 

Why wasn’t MySQL the right data engineering solution at a certain point?

  • MySQL is better for lightweight data transformation than complex business data flow. That is why it worked well at the beginning. 
  • The solution was based on reservation logic. However, the growth of the business and payment options, e.g. for photographers and translators, made the payment process more complicated. As a result, they had to create product flow as well, because of the uniqueness of each product payment. Then, any modifications added to the previously-built reservation logic was a limitation for adding new logic for products. 
  • Reporting was based on two sources of data that were created separately. At some point, this made validation impossible and caused discrepancies between the main tables and its elements.
  • When the transaction volume increased considerably, developing and testing SQL scripts was too complex, time-consuming, and prone to error.
  • The consequence of large data volume was also longer processing. The nightly runtime took over 24 hours.

A new approach – Scala Programming at Airbnb


Based on previous experiences and defined obstacles, the main priorities of Airbnb were clarified to be: 

  • to separate the financial logic from the product one in order to enable adding more changes to products and accounting when necessary,
  • to simply scale the system horizontally just by adding more machines to process data.

The result of the new approach was an event-based financial report programmed in Scala and powered by Apache Spark

The data engineering team decided on the application of Scala because of: 

  • the possibility to use the newest features of Spark, 
  • language advantages such as types, closures, immutability, lazy evaluation.
scala engineering

What is Scala used for at Airbnb?

At Airbnb, Scala features were applied to create a brand-new financial reporting workflow.

This concept enabled them to design and write handlers for different types of products and their further processing. This, in turn, helped with the calculation of the accounting impact of products at different stages of their life cycles. 

To clarify the workflow, let us show the differences between the types of events and their roles in the process. 

Whenever someone reserves a room, a new booking event is emitted. The next day, the system reports the event as “services rendered”. On the day of service delivery, they recognize revenue. All those stages impact financial accounting, which is why they are so important. 

Payment events that present the movement of money.

The simple explanation for tracking such events is that the money that goes in must equal the money that goes out. However, creating the logic here wasn’t limited to simple payments by guests. All because of the different payment methods available for customers. 

Let’s consider that a gift card is bought by someone. Here, the money may not actually move and be reported as “cash in” until someone does not use the gift card. Those events were described as “stored value”. 

Another example is using a coupon. In this case, there is no real money movement. However, creating an event enables tracking how much money from the discount should be taken from the marketing budget. Then the balance may be kept.

Accounting events generated by event handlers if payment and platform events are triggered.

The need for creating these was a result of the link between the two parameters. Each platform event may generate a few accounting impacts. But by assigning them to the ID of a given product type, they have a possibility to track what really happened. 
Based on accounting events, the next step is generating a sub-ledger. This includes the amount, currency, credit or debit impact, and related account. To be sure that everything is calculated correctly, they also use double-entry accounting, so any money may not appear if there is no source.

With all those events and handlers, it became quite easy to generate a query for the financial results for a given account. An example can be seen below. 

Benefits of Scala for Big Data Engineering 

The choice of Scala turned out to be a beneficial solution for Airbnb. Besides building a new, clear financial tracking system, the advantages of this implementation were visible in many more areas.

A few worth mentioning were: 

  • the possibility to scale both products and runtime, which took 4-5 hours after implementation, 
  • simpler to find the data that impacted the main financials in reports,
  • easier coding, 
  • the option to build a testing framework that helps to maintain the quality of delivered data.

The actual potential of Scala might only be discovered if we match it with powerful Scala libraries. Airbnb has already found its benefits as well.

What other solutions support the uniqueness of Scala then? And for what use cases are they applied at Airbnb?

  • Apache Spark – this is an open-source framework, a unified analytics engine for large-scale data processing. How does Airbnb take advantage of it? 
  • For example, they use it for search ranking. Apache Spark helps them with complex data handling and unit tests. They also appreciate the fact they can reuse code between Java and Scala. 
  • Another example is smart pricing. For example, they created a feature to suggest what price hosts should set. To manage this, they trained the machine learning model with Apache Spark.
  • Microservices – named also microservice architecture. This is a kind of architectural style that structures an application as a collection of services. This, in turn, enables rapid, frequent, and reliable delivery of large, complex apps. With such an architecture, evolving the technology stack is also made easier.


How is it related to the open-source system, Kubernetes? This solution is for automating the deployment, scaling, and management of containerized applications. It makes it easier to break down the application into separate, loosely-coupled microservices.

This is especially needed when services grow exponentially, like at Airbnb. At some point, they had a huge amount of services and environments. This required a standardization of the configuration, and the need for its automation and orchestration. As a result, in May 2019, they had over 150 critical services, and 50% of all services were in the Kubernetes system. 

Key takeaways 

The popularity of Scala and related technologies is a natural consequence of more data being available in the business world. The given financial intelligence case shows how they enable easier data management for financial purposes. 

However, the range of applications is far wider, spanning retail, automotive, transportation, entertainment, healthcare, and many more. Big brands like AirBnB, Apple, LinkedIn, and Zalando have already found out the value of Scala. 

New case studies also confirm the power of this programming language. What else is Scala used for? In fact, you can use it for anything from a simple web app to machine learning cases. This functional language provides the opportunity to simplify data complexity as well as ensuring efficient and quick data processing. All of these, together with its scalability, invites tech pioneers to join.

Isn’t it time for you as well? 

Sources

https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040 

https://www.scala.com/en/industries/

https://alvinalexander.com/photos/30-scala-job-openings-apple/

https://www.infoq.com/articles/linkedin-scala-jruby-voldemort/

Read more

Download e-book:

Scalac Case Study Book

Download now

Authors

Daria Karasek
Daria Karasek

Marketing Hero at Scalac. I strongly believe in creating opportunities rather than waiting for them to come. As befits Scalac team member I'm a hard worker, I always try to do the right thing and have a lot of fun! I'm an awesome friend and content writer, in that order. When I'm out of the office, I love to cook delicious Italian food and play board games with my friends. #boardgamegeek

Latest Blogposts

23.04.2024 / By  Bartosz Budnik

Kalix tutorial: Building invoice application

Kalix app building.

Scala is well-known for its great functional scala libraries which enable the building of complex applications designed for streaming data or providing reliable solutions with effect systems. However, there are not that many solutions which we could call frameworks to provide every necessary tool and out-of-the box integrations with databases, message brokers, etc. In 2022, Kalix was […]

17.04.2024 / By  Michał Szajkowski

Mocking Libraries can be your doom

Test Automations

Test automation is great. Nowadays, it’s become a crucial part of basically any software development process. And at the unit test level it is often a necessity to mimic a foreign service or other dependencies you want to isolate from. So in such a case, using a mock library should be an obvious choice that […]

04.04.2024 / By  Aleksander Rainko

Scala 3 Data Transformation Library: ducktape 0.2.0.

Scala 3 Data Transformation Library: Ducktape 2.0

Introduction: Is ducktape still all duct tape under the hood? Or, why are macros so cool that I’m basically rewriting it for the third time? Before I go off talking about the insides of the library, let’s first touch base on what ducktape actually is, its Github page describes it as this: Automatic and customizable […]

software product development

Need a successful project?

Estimate project