A pretty hot topic lately is machine learning – the inter-sectional discipline closely related to computational statistics that lets computers learn without being explicitly programmed.

It has been found to be of significant use in the field of data analytics – from estimating loan and insurance risk to trying to autonomously steer a car in real-life conditions.

In the following post, I would like to introduce to the reader MLlib – a machine learning library that is part of the Spark Framework.
Read more

For some time now Spark has been offering a Pipeline API (available in MLlib module) which facilitates building sequences of transformers and estimators in order to process the data and build a model. Moreover, Spark MLlib module ships with a plethora of custom transformers that make the process of data transformation easy and painless. But what happens if there is no transformer that supports a particular use case? Read more

During our last internal Backend Guild meeting we discussed the topic of Apache Spark. This post is to fill the details we missed and to organize the knowledge so it might be useful for people willing to start with Spark. Read more