In last decade we have experienced an uptick in online traffic across services. Small teams are often operating at a scale which was a prerogative of large enterprises. This, in turn, has led to innovation in paradigm and tooling related to distributed systems. Service-oriented architecture has evolved successfully into a microservices architecture. Microservices bring all micro-level concerns to macro system design and architecture.
A lot has been written about microservices already in the context of the benefits they bring in a large system. However, I feel a lot more can be said about the foundational architectural choices. This blog post dwells on those aspects and discusses various choices, recommendations and sample codes for ideas that tick.
Microservices are the architecture paradigm wherein a monolithic (silo) application is decomposed into small tiny micro applications which are packaged and deployed independently.
A microservice is a component that improves clarity, is an architectural style that structures an application as a collection of loosely-coupled services.
Microservices architecture is a really micro-component architecture. Microservice architecture is all about breaking down large (monolithic/silo) application into more manageable fully decoupled pieces with the aim of gaining benefit in terms of greater agility, more dynamic and scalable with targeted forms of resilience.
Understand Monolithic Architecture
Monolithic architecture refers to a group of software applications and components that are assembled together and tightly packed
Difference between SOA and microservices
In SOA, all the services are controlled by one software application, they work together and coordinate across all the services.
Example: A manager controls a team
In microservice, there are multiple services that are independent and not controlled by one. They work separately, requiring minimal coordination.
Example: An application development by individual contributors (independent programmers)
Web API vs Microservices
Web API is nothing but RESTful Web Services
REST or Web API is an architectural style introduced in 2000, primarily designed to work well with HTTP. Its core principle is to define named resources that can be manipulated using a small number of methods. The resources and methods are known as nouns and verbs of APIs. With the HTTP protocol, the resources are mapped to URLs, and methods like POST, GET, PUT, PATCH, and DELETE are used.
Microservices were introduced in 2011 as an architectural style that structures an application as a collection of loosely-coupled services.
Sync vs Async
Microservices architecture discussion typically focuses on the size and collaborating nature of services. Collaboration can be both Synchronous and Asynchronous in nature. I believe this is a foundational architecture decision that needs to be well thought through in the context of the system being built. Both approaches have advantages and disadvantages that make them a good fit for a certain type of constraints.
Synchronous microservices are a collection of small services working together to stitch a business flow. They are easier to develop, monitor and configure. Although HTTP isn't the only choice for synchronous services, it is a popular one. Each service can leverage existing benefits of HTTP protocol in terms of etags, caching, service discovery/resolution, content specification, and support.
At the same time, it is worth noting that using synchronous services takes away a few benefits of microservices. Each service consumes its dependencies directly and is responsible for isolating the failure. This typically results in redundant checks to prevent cascading failures across the system. However, the biggest disadvantage is that as the system evolves it becomes a complex mesh of cross-service calls. Over a period of time, this can make flows non-linear and the system tightly coupled. A contract change for a service results in cascading changes across services unless versioning has been baked in early on. There is also a need for service discovery and load balancing of each service independently to maintain resilience. This introduces additional components that make architecture complex.
An example of intra-service interaction flow for ordering product in synchronous microservice architecture flow.
Synchronous services can be contrasted against Asynchronous ones. In this paradigm, a messaging pipeline acts as the backbone of the system that acts as a transit for a common payload. At the entry point, a payload can be dropped onto the messaging pipeline and it is subsequently picked by individual services to be processed and put back on the pipe. Each service takes turns to consume and optionally enrich it for downstream services. It begs a question as to what happens to the originating request at the entry point? Does it wait for the process to complete asynchronously? There are several ways that it can be handled, we will talk about it in the section below that addresses the design of entry point for a microservice architecture.
Asynchronous services are sort of workers for a queue. Running multiple workers provides resilience to an extent without the need for a load balancer. Service discovery is not required except for the central queue. Failures are easy to manage without cascading upstream or resulting in a complete denial of service. However, the biggest benefit is that services can be stitched together to create a processing flow at both runtimes or at the time of deployment. Ability to do this at runtime can be very powerful in simplifying flows in complex systems.
The downside of asynchronous style is that the message bus is the most crucial component of the system. It can also be a single point of failure unless guarded very well. Running a message bus reliably can be complex and requires operational maturity. Since the request-response cycle for a user interface may not be synchronous, it demands some cognizance of asynchronous nature in experience design for the end user.
An example of intra-service interaction flow for ordering product in asynchronous microservice architecture.
Traditional system design reinforces synchronicity in user experience because they are built on strongly consistent systems. Designing a scalable microservice will often demand a system that functions asynchronously and is eventually consistent. Often this paradigm will introduce delays that will not be transparent to the user. An example will be when a user creates a record but it is not being shown in the list of records because the system is still processing it in the background. Systems need to tweak experience and messaging to communicate the asynchronous nature. Here is a simple example.
A successful payment message in a strongly consistent system
Your payment has been successfully completed.
Contrast that with something that will make sense in a system where payments are processed in the background. A similar message is received on Amazon web services while making a payment. Indicating, that execution is asynchronous in nature.
Your payment is under processing. You should receive a confirmation soon.
System experience design can pacify some of the anxieties for a user when they don't see their changes reflect instantaneously, however, there are times, when it is absolutely necessary that we maintain synchronicity of the system. For instance, an external system calling your integration API. The external system may not have been designed to handle asynchronous services.
We have several options that can be adopted here. Once this facade is developed, it can be used to maintain a synchronous flow with web/UI as well.
Sync wrapper as an entry point
Depending on technology stack there can be a wide array of choices that can help you represent an asynchronous system as a synchronous one. I'll take a simple Java example here and depict how an incoming request can leverage different results to hold on a request to trigger an asynchronous flow in the background.
Alternatively, futures can be used to asynchronously call services in the background but blocking for a response until all the calls complete.
The biggest drawback is that unless sensible timeouts are used in this process, a long-running asynchronous flow can continue to hum in the background while the entry point gets overwhelmed quickly due to requests waiting for responses. It is advisable to keep processing time for asynchronous low and establish a timeout on entry point for requests so that long-running requests are evicted if they start to hog all resources on the server.
Command Query Responsibility Segregation
A more robust way to scale a microservice for synchronous operations separate the Read (performed
synchronously) operations from Write (performed asynchronously) operations.
A synchronous layer is optimized for Read operations. Futures/Reactive APIs are a perfect fit for this kind of layer since this can often be IO bound and a single incoming request can easily be fanned-out into multiple requests across services. Data model and DB design can cater to the needs of Read operations.
An asynchronous layer is centered around create, update, and delete requests which are processed via a message bus asynchronously. This allows for creating data model and code that focuses on optimal write behavior. It also allows scaling these two operations independently.
In a typical content publishing system, content generation and content consumption are lopsided. Consumption or reads are much higher compared to writes or generation. A design like this can really be a boon since it isolates the part of the system thereby reducing interference.
Orchestrators vs Push it forward
When implementing a collaborating system with multiple services, one of the crucial decision that needs to be made is orchestration. Two popular approaches are centralized orchestrators and de-centralized push it forward approaches where the state is encapsulated as part of the message payload.
This is a service in itself and all the workflow logic/stitching lies here. They are typically the first one to receive the incoming message payload and are responsible for the routing. Each stage responds back to
the common orchestrator and orchestrator, in turn, makes a decision for the next stage/service for the payload. The bindings for response queue is static for each service, as the result of a service is always sent to the central orchestrator.
The obvious advantage is that all workflow logic resides in a single service and can be configured. This allows for greater flexibility. Payload related tracking and housekeeping can be easily and centrally done. As a producer for each service in the system, it can throttle messages when needed.
The biggest disadvantage is that it is a single point of failure and needs to be well thought out in terms of redundancy. As the system load increases, the orchestrator is the first one to feel the heat. Orchestrator is typically dealing with total incoming messages.
average service execution per message numbers of messages.
Push it forward
A more decentralized way to deal with asynchronous message flow is to piggyback the state on the initial payload. This allows each state to record its result, additional metrics and make a decision about next service based on the state captured on the payload.
The advantage is that all services are distributing the workload of orchestrating among themselves, so the system is far more resilient to increased load in messages. Depending on how the abstractions have been done in flow construction for a business flow, deciding next stage may work well in this scenario.
The downsides are that each service needs to constantly set up and tear down publishers for next stage or run multiple publishers for next stage concurrently. Also, if not paid attention to, the workflow knowledge can quickly spread out across services. Finally, when a service goes down, the execution stops thereon for the flows that pass through that service.
Based on my experience I would recommend that for small to medium loads of messages, central orchestrators are far easier to deal with and provide great opportunities to do housekeeping across all workflows. However, if you are operating at scale, you might consider designing a workflow abstraction that uses a lightweight entry orchestrator that embeds workflow state in the payload. All stages use that information to route message until it reaches the end of life cycle.
Modular and independent services are the building blocks of microservices architecture. However, they can also become its Achilles heel. Precautions must be taken while designing contract between service whether synchronous or asynchronous. There are several aspects to keep in mind while designing intercommunication between services.
Payloads can be transmitted over various protocols and formats. Both text and binary in nature. While selecting them care should be taken to build resilience around payload evolution, as it is inevitable. If you are using any sort of schema validation of payload or are using binary formats that require a formal definition of payload upfront, it is important to keep in mind:
An optional field can be made mandatory later. However, once you start with a mandatory field, if you want to make it optional, it will require default values for backward compatibility. So as far as possible keep fields optional. Any logic around their need to be present should be handled as part of business logic and not payload definition. Don't constraint field sizes and precision pre-maturely. Go with some headroom so that when the time comes to expand, you aren't caught off guard. For instance, treating unique identifiers as an int may not be a great idea as in a growing system you will quickly run out of digits to generate new identifiers. Changing it later might require a cascading change to all the services. Binary formats outperform other textual formats but they also exhibit greater inflexibility in change as part of contract evolution. Evaluate a format not only for speed but also for the flexibility it brings to table. The payload is going to be the backbone of your system so it is important to consider all the aspects.
When consuming an upstream service in microservices architecture, a good practice is to ensure that you bind to the minimal fields that you need. Also, in a statically typed language like Java, rather than de-serializing data into concrete objects, use loose dictionaries and extract only the fields that you need. This ensures that your service will survive any modification to a field that is not consumed and will also pass it successfully downstream.
Between the above listed two approaches, Approach 2 is more resilient to any new fields that may have been added, modified or deleted. It will not throw an error until you try to extract them explicitly. Also, if you pass-forward the payload to the downstream services, a contract modification in upstream service does not require a local change, as fields will be carried forward automatically. Keeping this fundamental detail in mind can help you decouple your services.
Versioning and backward compatibility
Baking-in an ability to version the services right from start is a prudent choice. If the system you build is truly a successful microservices architecture, each service will continue to grow independently. Having the ability to deploy multiple versions will eventually be needed. Any deletion or updates of fields in payload requires considerations around downstream services which may be expecting certain fields. Backward compatibility must be maintained with a phase-out/deprecation approach.
It is important to note, that versioning should be baked in design, but actually versioning a service should be pushed out as far as possible. Deploying multiple versions concurrently requires considerable maturity in both synchronous (discovering service by version) and asynchronous (distinctive routing for different versions) worlds. It is, therefore, best avoided for as long as possible.
Request tracing, cycle times and circuit breakers
With most of the key rules out of the way, there are few small learnings that I'd like to share. They have been pivotal in bringing traceability and debugging capabilities across services for all the asynchronicity that goes on in it.
The entry point is typically the perfect point to introduce a unique id that stays embedded within the payload for the rest of its journey. This needs to be a non-business id. An id (hopefully a UUID) that serves an identity of a request. Every service includes this id in the logs (it also helps to have consistent log format across services) for every request received. This brings traceability. If you choose to aggregate your logs in something like elasticsearch or cloud watch, request ids can help you trace a request across services.
When communication occurs between services over non-HTTP protocols, it can be a part of the payload. When using HTTP, it is common to use HTTP headers for the request id.
Recording cycle times for a message within a service can be really useful. It will highlight the services that consume most of the time for a message processing and screaming for optimizations. It is important to ensure that recordkeeping itself for cycle time does not introduce a significant overhead for services.
It is easier to do this if we use a single central orchestrator model, as a single orchestrator can update payload with a start and end time for each service. In a “push it forward” model, each service needs to agree on the semantics of recording cycle times.
Unchecked queue pileups are not uncommon in an asynchronous system. Even in a synchronous system, a component can quickly overwhelm when downstream services become unresponsive. Circuit breakers help in regulating the load by rejecting the requests once one of the components in the pipeline goes down.
A simple mechanism to implement circuit breaker would be to have central orchestrator poll each service periodically. If one of the services stops responding, a switch is toggled at the entry point and that will result in rejection of any new requests until system recovers as a whole.
This self-regulating behavior can prevent the system from spiraling down into a massive crash that warrants a much longer recovery time.
Martin Fowler's blog entry gives a great introduction to circuit breakers.
Not all that glitters is gold
Microservices architecture brings tremendous benefits to a large system that needs to operate at scale. They help in compartmentalizing bottlenecks and failures and allow for the independent scaling of services. However, they come with a trade-off. They require much higher operations maturity when it comes to running a system. You don't want to deploy and run a large set of services manually or debug failures hopping across servers. This maturity requires investment in practices and systems which will not add value to your business immediately unless you are operating at scale. So it is important to understand trade-offs and evaluate them against the benefits it brings before adopting it.
However, if you do choose to adopt it, the guidelines listed here should help you scale the system and maintain general hygiene across services and as a system.