Typelevel ecosystem: a high-level overview
In this post, we’ll look at what the Typelevel ecosystem looks like in 2018, and how its various libraries interact with each other. In particular, we’ll focus on how we can compose some of these libraries to build a complete application, in a purely functional fashion.
This will not be a tutorial for Cats (there will actually be hardly any code here) – there’s plenty of learning material for that (linked at the end of this post) – but a high-level overview of the ecosystem and the way its pieces interact with each other.
Before we look at the libraries, however, we need to know what Typelevel and Cats are.
typelevel
typelevel.scala is a community built around independent open-source software. It focuses on pure, typeful functional programming in Scala, as well as approachability of the libraries built by its community, and the inclusivity of the environment.
Most of Typelevel’s online and offline activity involves maintaining open-source functional programming libraries, giving talks on related subjects and organizing events for the community.
The flagship “product” of Typelevel is Cats.
cats
Cats is a Scala library that provides core building blocks (abstractions) for libraries in the ecosystem – various type classes, data types and some syntactic enrichments. These abstractions allow libraries from the ecosystem to interact with each other, as long as their data types and type classes conform to the interface defined by Cats.
If you want to get a general idea of how that works in an application, I’m talking about using some of these building blocks in my Fantastic Monads and where to find them talk.
Now that we know more about Cats, let’s add one more core building block to the bigger picture:
cats-effect
Another Scala library from the typelevel umbrella is cats-effect, which extends Cats with some additional type classes and data types. It’s a relatively new project focusing on wrapping effectful code in a referentially transparent context (like an IO monad), which makes side effects easier to reason about. It also provides some asynchronous primitives like Fiber, and an implementation of IO.
cats-effect
will be an important building block in an application on a typelevel stack, as it’s extensively used in some of the libraries we’ll look at later in this post. Using the type classes provided by the library, we can completely deattach ourselves from the effect type we’ll use to handle I/O operations – we can make the decision in a single place (like object Main
), and make it apply to the whole application at once.
These type classes also enable us to make effectful computations (like getting the current time, or reading a file) referentially transparent, which wouldn’t be possible if they were computed in e.g. a Future
.
Let’s talk about referential transparency with cats-effect
by using an example of computing the current time.
A side-effecting function for getting the current time could be System.currentTimeMillis()
. If we were to get the time at two different points on the timeline, we might want to do:
We’re using a helper function, time()
, to get the time – but it doesn’t explicitly say it does any side effects. So let’s wrap it in a Future
, as most people would do.
This would work, but Future
breaks referential transparency – because if we were to do:
The printed value would be always 0
. We shouldn’t have to worry about such cases, so let’s replace the use of Future
with Task
.
Now the printed time difference will be whatever amount of milliseconds it took for x
to be calculated. Imagine this snippet was placed inside a Tagless Final algebra:
That way we could use the Sync[F]
instance to delay the effectful computation getting the current time (in a tuple with the actual result of calculateSomething(args)
).
A web application
A typical modern web application usually has to perform some of these tasks:
- handle HTTP traffic
- request/response streaming
- serialize/deserialize JSON content
- validate incoming data
- connect to different applications (e.g. in a microservice architecture) via HTTP(S)
- save/read data to disk/a data store
Let’s imagine we need to do all of these. Here’s a raw approximation of what that would look like:
If that looks like an oversimplification, it’s because it is one – we left out the streaming part (an exercise for the reader). We also didn’t show the responses for any calls, but you can assume there would be a response for every request.
The diagram isn’t specific about any tools we’re going to use to handle the aspects of the application, so let’s talk about a few libraries that are going to help us out:
cats itself
You didn’t think we were going to skip this one, did you? As mentioned before, cats-core
contains more than just basic type classes for the libraries – it also provides data structures. One of them is cats.data.Validated
, which has been vastly covered by numerous talks and blogposts alike.
The only thing you need to know for now is that Validated[E, A]
is like an Either[Invalid[E], Valid[A]]
that’s built with error accumulation in mind – if E
has a Semigroup
type class instance, you can compose multiple Validated[E, A]
values into one.
There’s more to it than just error composition, but for now it’ll suffice.
circe
circe is a JSON library built with Cats. It provides utilities for working with JSON values, including parsing Strings to a JSON value, automatic conversions between case classes/sealed trait hierarchies and JSON, and more. It provides type classes for encoding and decoding JSON values, as well as instances of commonly used type classes from Cats.
fs2
Sadly, no logo here.
fs2 is a streaming IO library. In English, that means it provides you with an abstraction of a stream that, when ran, will produce (or consume) values wrapped in an effect type of your choice – be it IO, Task, or any other effectful monad (that has a Sync
instance, from cats-effect
). Also included: utilities for writing/reading files in a streaming fashion.
http4s
http4s provides a HTTP server (as well as a client) built on top of fs2
and cats-effect
. It provides a routing DSL, support for streaming HTTP requests/responses, type classes to handle request/response entity encoding/decoding, and there’s a submodule for circe
that instantiates these type classes for types that have appropriate circe
instances (which basically means that using circe
with http4s
is as simple as adding an import).
doobie
The last library we’ll be using is doobie – a purely functional JDBC wrapper. It’s written with cats-effect
as well (surprise), and it allows you to make the execution of SQL queries referentially transparent with ease.
Given the descriptions of these libraries, let’s see an updated approximation of our application’s structure:
As you can see, we placed Circe near the places where we’d handle JSON. http4s
will handle our HTTP requests, as well as client calls we’re going to make to external services. We’ll validate incoming data using cats.data.Validated
, and the persistence of data will be handled by Doobie.
As http4s
is built on fs2
, we’ll use that library in a less direct way.
Case study: counting page views
As every sensible explanation of a topic should, we’ll need an example for our application. Let’s imagine we’re working on a service that will count page views.
We will look at this example and see how we could implement JSON (de)serialization, HTTP routing, validation and persistence for it – we’ll skip client HTTP calls, as that works pretty similar to libraries other than http4s – you can check for yourself in the http4s guide.
The API specification that we got says we should implement a counting endpoint:
Now, an endpoint using the http4s
DSL could look like this:
For each POST
request to /views
, we’ll try to get a PageViewed
object from its body. Then we’ll forward the object to the checkAndSave
function, together with the optional ip
we got from the HTTP request.
In case the saveView
function doesn’t inform us about any errors, we’ll assume everything went well and return the result
’s Valid
value in JSON
with a 201 Created
status code. Otherwise, the errors will be handled (let’s assume validatedToJson
will just provide a 422 Unprocessable Entity
response).
Let’s look at what checkAndSave
should do.
Validation
There are a few things we need to do with the request before saving the pageview to our database.
First of all, we could make sure that the tracked page actually exists (note:assuming it’s publicly accessible) – e.g. by asking our database if we’ve already tracked views for it, and making a request to the page otherwise.
As seen in the example, the path to the document that we’re going to count the views for contains query string parameters – the spec says we should normalize the path by removing all of them – we wouldn’t want to count views for distinct tracking IDs separately.
Note: in the real world, you might actually want to preserve some of the query string parameters (depending on how a website is configured) – that’s where some websites pass the identifier for the content to be displayed.
You can also see an authorization header. It’s a JWT (JSON Web Token) – if you decode it (for example, by pasting it on the website linked), you can see that its payload the sub
field with the value scala-lang.org
– that’s the hostname of the website requesting tracking a pageview. We’ve written about JWT before, if you’re not familiar with that technology.
For now, all you need to know is that, given a JWT like above, we can confirm whether it was issued by us, and use the payload it contains – even though the payload is only encoded with base64
, which makes it trivial to read for anyone who sees it.
Here, we’ll use the JWT to identify which website’s article got a view.
If you want to handle JWTs (and password hashing, plus different cryptographic things) in a pure FP fashion, check out the tsec library (it even has a http4s
module).
We know what page was displayed, we know the hostname of the website it was shown on, but we’re also asked not to count views from the same IP twice. We can try to get the IP the request was made from by extracting it from the X-Forwarded-For
request header, or, if not available, by getting the request’s remoteAddr
.
Note: in reality, an attempt to ensure uniqueness by checking the IP will incur massive data loss – chances are, the IP you’re going to get is shared by a whole building, or even a whole district! Tracking views would be more accurate if you generated a cookie (or a localStorage
field) once per user, and identify users by the value of that cookie. Then you’d need to worry about having a cookie policy, so we won’t be doing that in this example, to ease the pain.
To summarize: the verification and transformations we need to make:
- validate the JWT
- ensure the
date
field in the body is parsed correctly - check if the view didn’t happen in the future ;)
- strip query parameters from the
path
field - get the client IP – we won’t track requests without one
- ensure the page exists on the hostname in the JWT
- make sure we haven’t saved a view for the given
(ip, path, hostname)
parameters
…and if all assumptions are correct, we can save the view. Yay!
Note that we skipped the validation of the JWT in the http4s
example – if you’re curious about how that could be handled in http4s
, you can look at the aforementioned library tsec
. Here, we’ll just assume the controller provided us with the hostname extracted from the JWT.
The actual signature of the checkAndSave
function would then look more like this:
Note that we aren’t using Future
or IO
or Task
explicitly in these examples – it’s just F
. To find out more about how that works, you can get familiar with the Tagless Final pattern by reading our blogpost or watching one of Luka Jacobowitz’s talks on the topic.
Our error type could be defined as:
Given the signature and these error definitions, we can implement the checkAndSave
method.
If this looks like cryptic writings of a possessed madman, don’t worry. We’ll look at each piece now:
ValidatedNel[PVError, String]
is either aValid(s: String)
or anInvalid(e: NonEmptyList[PVError])
- we first validate the IP by checking if it’s there (
clientIp.toValidNel(PVError.noIp)
). - in case of success, make a call to the
checkIp
function (which will return anF[ValidatedNel[PVError, Unit]]
). Afterwards, we would haveF[ValidatedNel[PVError, ValidatedNel[PVError, String]]]
, so the most reasonable thing would be to call.map(_.flatten)
– butValidated
doesn’t haveflatten
, so we callandThen(identity)
, which will essentially flatten the nested value to a singleValidatedNel[PVError, String]
. - in case of failure, we’ll end up with our errors being wrapped in the
F
context (that’s whattraverse
would do here).
And that’s it!
…for validating the IP. Let’s look at the way we check the path.
- First, we normalize the path in the first statement of the method (dropping all the query parameters).
- Then, we create a
pathValidationF
by making a call topageExists
(which would, under the hood, check the DB and potentially the website for the page’s existence). - The result of that function is
F[Boolean]
, so we’ll callValidated.condNel
and pass that boolean – which is all happening inside the function passed tomap
. In case offalse
we’ll get aNonEmptyList(PageNotFound)
, in case oftrue
we’ll get the normalized path back.
These two validations end had to be made first, as that requires an effectful check (denoted by the F[_]
type), and we can’t easily combine effectful and “pure” checks together. Having checked the effectful ones separately, we can proceed with further validations.
Note: we skipped uuidF
here. It doesn’t actually validate anything, but its implementation could be read as “at some point, generate a random UUID in the context of F
”.
We pack the effectful checks in a tuple and call traverseN(f)
on it – you can think of it as “run f
when these four are done, and return the result in the same effect”.
The function we pass takes the values from inside our (...)ValidationF
s, and combines them with other validations to build another tuple.
This time, the tuple will consist of elements that are ValidatedNel[PVError, A]
, each having its own A
. We’ll call mapN
, a function similar to traverseN
– but what this one will do is ensure all the validations in the tuple are Valid, and call the provided function (in this case, PageView.apply
– returning a PageView
) with them.
If any of the validations doesn’t pass, mapN
will collect all the errors together. So the result of mapN(PageView)
is of type ValidatedNel[PVError, PageView]
.
Note: the only non-effectful validation that we make here is checking whether the pageview happened before the current point on the timeline, as we’ve hardcoded the rest to .valid
– but this will not always be the case, as you might want to check e.g. whether the length of a string matches the configured limits.
One might argue that it’s actually effectful because it depends on currentTime
, but the way we’re using it to validate the passed timestamp doesn’t involve any side effects per se.
Having either the errors or a PageView
that we can save, we can call traverse
on that ValidatedNel
, passing a persisting method as a parameter. That way, the whole chain of (..., ...).mapN(PageView).traverse(...)
will give us either an Invalid
with all the validation errors, or a Valid
– wrapped in F[_]
in both cases.
Let’s look at the part where we combine effects and validations again:
Because there’s no function flatMapN
, we still need to flatten the result of the traverseN
call we made on our effectful validations – otherwise we would have F[F[ValidatedNel[PVError, Unit]]]
. So that’s what we do! – hence the .flatten
at the end, and the last call in the function.
To sum up, the function will handle its arguments in a way that’ll make it return a value inside the context of F
– which will either be a list of validation errors (guaranteed not to be empty), or a Unit
.
How does this use the elements of the ecosystem?
First of all, we extensively used Validated
– a data type from Cats – to combine the potential errors that we could get from checking the input (using mapN
).
We also used traverse
and traverseN
from the syntax for the Traverse
type class in Cats to “flip” our wrapped types. In a simplified example, List(1,2,3).traverse(x => Future(x))
would give us a Future[List[Int]]
. traverseN
does a similar “flipping”, but on N
inputs.
At last but not least, we used the Sync
type class from cats-effect
– for an expression like Sync[F].delay { UUID.randomUUID() }
, if we specified F = Task
, that expression would be equivalent to Task { UUID.randomUUID() }
. If we said F = IO
, it would be IO { UUID.randomUUID() }
, etc.
Now that we’re at Sync
and cats-effect
, let’s talk about persistence.
Persistence
In the validation code above, we only used one method related to persistence – repository.persist
.
Assuming our data store is an SQL database – let’s say, PostgreSQL – the simplest (perhaps not optimal, though) definition of the table for pageviews would be as following:
Table: page_views
text
is the default string type here – but in the real world we would rather use a length-limited varchar
type instead.
Given the table definition, our persist
function could be implemented in the following way, using Doobie:
Quite verbose, but (given a proper Doobie transactor
) this function will give us a database insert suspended in F
– similarly to Sync[F].delay { actuallyInsert() }
.
For persistence, the only things we interacted with were Doobie and cats-effect
– because the transactor
will use a Sync[F]
instance underneath.
Summary
We looked at a specification of a HTTP endpoint and implemented some of its implementation using building blocks from the Cats, cats-effect, circe, http4s and Doobie libraries, learning how they can be composed to build a working HTTP service.
Of course a blogpost can’t dive into any of the mentioned libraries deeply enough to explain everything we just saw in detail, but I hope it’s enough to get you to click the links, read and experiment yourself :)
However, what would we do without
Shameless self-advertising
I’m going to lead a full-day workshop focused around building a similar application with the building blocks mentioned in this post at this year’s ScalaWave conference edition. We’ll cover all the steps to build a few fully functional endpoints like the above, including streaming HTTP requests/responses and more complex database logic.
We’ll also spend a good portion of time discussing commonly used patterns that’ll help us write purely functional software using the typelevel/cats ecosystem. It’ll be fun!
follow us on Twitter to keep up with updates or get tickets for the conference now! EUR or PLN
Links
Here are all the links from the blogpost, plus some more learning resources:
Project websites
learning resources
- Cats documentation
- Scala Exercises: cats
- http4s client example
- Herding cats
- Fantastic Monads and where to find them
- Luka Jacobowitz’s talk on Tagless Final
- Out blogpost mentioning JWT
- Our blogpost on Tagless Final