From Monolith to Microservices and Back Again
There's a saying that software engineers know the benefits of everything and the drawbacks of nothing. When the microservices hype took off circa 2015, everyone was talking about the benefits:
- Resilience
- Scaling
- Ease of deployment
- Clear ownership boundaries and team alignment
At Managed by Q, we went from monolith to microservices and then back to a happy medium. We realized that while some of the benefits of microservices are real, there are also very real downsides and risks.
From monolith to microservices
In 2017 I joined the engineering team at Managed by Q, a company that ran a services marketplace for office managers. The team size was about 20 engineers at the time, and our monolith was a Django application deployed on ECS.
We were doing a lot of greenfield development, so we mostly built new services to house the new development work. Here is an incomplete list of some of the services we created over the course of about two years:
- Invoicing - Managed the customer-facing invoicing lifecycle.
- Charging - Managed the stripe charging/payments lifecycle.
- Pricing - Managed pricing of services on the marketplace.
- Matchmaker - Connected office managers to vendors on the marketplace.
- Messaging - Managed the chat feature on our dashboard.
- Notifications - Managed push notifications, in-product notifications, and email.
- Reviews - Managed customer reviews of vendors on the marketplace.
- Netsuite Sync - Synced data to Netsuite.
- Salesforce Sync - Synced data to Salesforce.
- Stripe Sync - A transformation layer between stripe and our systems.
- RDS Monitor - Made sure our Postgres databases were being correctly backed up.
- Datadog Monitor - Used to monitor that our datadog agents were working.
- Github Notifier - Notified us in slack when we got tagged for review on a PR.
- Gadgets - A suite of miscellaneous tools.
We felt pretty good watching the number of services grow. After all, we were catching up to modern development practices! Plus, doing work in these smaller services was often really nice compared to working in the monolith.
Reaping the benefits
Our microservices came with obvious advantages.
Clear ownership boundaries
Microservices give you ownership boundaries for free. There's little room for ambiguity as to who's job it is to fix something. If your team owns the service, it's your job to look into the problem. Clear ownership also insentivizes teams to take care of their services, and not let problems fester.
Smaller test suites
Our monolith's test suite took between 5-10 minutes to run. Our microservices' tests took a couple seconds.
Faster deploys
With lighter test suites came faster deploys. Our monolith had a suite of selenium tests that made sure critical dashboard features were working. Since many services weren't customer-facing, we could skip this type of testing entirely. This cut deploy times in half for many services.
CI/CD issues had a smaller impact
Sometimes a service's CI/CD pipeline would break, preventing new code from being deployed. Or a bug would get shipped, and we'd have to freeze deploys to that service until it got fixed. If this happend in the monolith, everyone's PRs had to wait while the problem was being fixed. If you were working in a microservice, however, other teams' mistakes wouldn't block your deploys. One of the most satisfying things about working in a microservice was maintaining a high development velocity while the monolith's deploy pipeline was blocked.
Simpler, lower risk dependency updates
Updating things in general is much less scary when the change is isolated to a microservice. As a result, many of our small services were much more up-to-date than our monolith. For example, all of our microservices were on Python 3, while our monolith remained on Python 2 for some time. Many teams naturally shied away from updating the monolith, as that could break parts of the system they weren't familiar with. Additionally, it wasn't clear that it was anyone's responsibility in particular to keep the monolith up-to-date.
Noticing the downsides
As the number of services grew, some of us had a creeping suspicion that things were more difficult than they used to be.
More infrastructure to maintain
Each new service meant adding some infrastructure. There's the obvious stuff, like an ECS service, Postgres instance, and RabbitMQ instance. Then there's also CI/CD configuration, and extra configuration in third party services such as Rollbar/Sentry. Dependencies also have to be updated in more places. The infrastructure team spent weeks on projects during which tedious work had to be repeated for every service.
Neglected services
The smallest services almost never received any attention. Once they were set up they could basically be left alone, and as a result became way out of date.
When a service has gone untouched for 6 months, people are hesitant to start messing with it again. 90% of the commits to these services were dependency or infrastructure updates. We had essentially just created an additional maintenance burden for ourselves.
Slower feature development
If you get the service boundary wrong, it can significantly slow down feature development. This is a huge risk when it comes to microservices. Building a feature that spans multiple services is usually more work. Doing a refactor that spans services is a nightmare. When service boundaries are just right, most projects only need to impact one service. However, startups often move in unexpected new directions. Two parts of the product that are completely separate now could become tightly coupled a year from now. It's often hard to anticipate exactly where the service boundaries need to be.
Libraries
To give all our services the same level of tooling, we needed to pull libraries out of our monolith. Updating these libraries can be a pain. You need to push a new release, and then if the update is important, actually go and update the library in 15 different places.
Too many different technologies
One of the touted benefits of microservice is "technology heterogeneity". This ended up being a problem. Our monolith was built with Django, but some of or microservices used Flask. The monolith used Celery for asynchronous tasks, and some microservices used RQ. One microservice used DynamoDB, while the rest of our services used Postgres.
At one point we spent a significant amount of time refactoring all or microservices to use the same technologies. We had libraries that depended on a specific tech stack, and our microservices couldn't use them. Additionally, technology heterogeneity made it difficult for developers to start working in a new service.
Local development issues
As we added more services it became necessary to run multiple of them locally during development. Having dockerized applications helped with this, but inevitably there were more problems in getting things working locally than there used to be.
Finding a balance: reasonably-sized services
Two years after the initial push for microservices, teams were merging services back together. Some services were merged into the monolith. Others were combined into larger units. All in all, we removed 9 microservices in the span of a year.
Not all of our microservices were failures. Our invoicing service, for example, ended up absorbing our charging service and a few others, and was quite a stable service boundary. Four services were merged into our "Gadgets" service, which ended up being a useful place to put tooling that didn't belong with our core domain models.
Reasonably-sized services have a significant slice of responsibility such that most feature development can be done within a single service. They are large enough that you don't have too much excess infrastructure. They allow teams to have clear ownership boundaries and prevent teams from stepping on each others' toes during development.
If your company is is considering moving to a micro-services architecture, be careful of where you draw the service boundaries, and consider using reasonably-sized services instead.