When I joined Scalac DevOps crew 441 days ago, the entire infrastructure (did we have one actually?) was a mess. My first assignment was to run a stocktaking and make notes on what kind of demons are hidden in the closets. I was thunderstruck.
We had two Jenkins instances – working simultaneously, some servers in OVH, some nodes in Microsoft Azure and the rest on Desktop PC in HQ (also known as Red Devil). Almost every service was launched on Screen. Besides that, one of the biggest surprises was a mix of production and dev environments on the same servers without any segregation or divisions. One on all, all on one. How come did it last so long?
Although we did not have many productive applications, every procedure resembled an open-heart surgery without an anesthesiologist.
In addition, some of the source code was stored on GitHub and some on an internally hosted GitLab. We would have to maintain and keep up and running two sources, and that was not an option.
In the wake of many team meetings, we made a strong decision to reorganize it – codename: let’s kill’em all.
We wanted to have everything consolidated so that management, modification, and maintenance were definitely easier than with the configuration that was there at the very beginning.
For the code repositories, GitHub was chosen, also due to the fact that a lot of external tools can integrate with it like a charm, and thus the automation process becomes easier.
As we were struggling with different application setups, different configs and various versions all over servers, Docker was chosen as a remedy. Having that, we were able to switch locations with the assurance that no matter what the host is, application/s will work.
Even though our infrastructure did not belong to the biggest ones, it still required a lot of time to keep it safe and sound, called for a lot of manual actions and honestly, it was like having a blind man walked through the pedestrian crossing by another blind man. We did not have any monitoring solution, no alerting tool which would allow us to pinpoint where it hurts the most – and it did badly, many times. Grafana and Prometheus – these services became our guides.
One of the main ingredients for well-working orchestrations is to have an entire infrastructure in the code (IaaC). Terraform was chosen among the list of provisioning tools as it is the most developed with regard to our needs and widely supported by open-source community. Using that we were able to encapsulate whole infrastructure in the code.
This part of the whole process took a huge amount of time, but it was undoubtedly worth doing that. I will tell about it in more detail a bit later.
Ansible – without this Configuration Management Tool, we would still be doing updates, changes, implementations on the servers manually. Similarly to Terraform, to choose the proper implement, we were guided by the following facts:
As with Terraform, I will elaborate on it soon.
Packer – this tool is a time-saver because it let us create identical machine images with all required packages installed. Thanks to that, we could use them in Terraform right away, without installing everything from scratch.
Last but not least – Jenkins. Continuous Integration and Continuous Delivery, extremely valuable and important to the deployment orchestration. We based our CI/CD on Jenkins, because we had the experience with this solution before, it was suitable for our needs (plugins, modules, integrations) and we did not want to waste time seeking for a new one.
Having a toolchain chosen, we had to make a big call – where to splice it all?
In the last quarter of 2018, AWS has dominated the cloud solutions, which is why we decided to choose this provider. The largest number of services and resources, high dynamics of changes and updates gave us confidence that we would not have to reconsider our decisions next year in case AWS suddenly were to stop developing.
Using AWS tools (Infrastructure as a Service) such as Spot Instances, Auto Scaling Groups, Launch Templates, RDS, S3 and many others, we started the migration process, thus ensuring high availability and self-healing mechanism if one or more of our basic elements were unexpectedly taken by the evil spirits crawling in the shadows.