Introduction to Machine Learning with Spark and MLlib (DataFrame API) A pretty hot topic lately is machine learning – the inter-sectional discipline closely related to computational statistics that let’s computers learn without being explicitly programmed. It has found to be of significant use in the field of data analytics – from estimating loan and insurance […]

For some time now Spark has been offering a Pipeline API (available in MLlib module) which facilitates building sequences of transformers and estimators in order to process the data and build a model. Moreover, Spark MLlib module ships with a plethora of custom transformers that make the process of data transformation easy and painless. But what happens if there is no transformer that supports a particular use case? Read more

During our last internal Backend Guild meeting we discussed the topic of Apache Spark. This post is to fill the details we missed and to organize the knowledge so it might be useful for people willing to start with Spark. Read more