Domain Orchestrator - Designing our orchestrator

Domain Orchestrator - Designing our orchestrator

in

Series

This post is part of the Domain Orchestrator series:

The goal

As explained in the previous post, we felt a little constrained by our smart consumer model. With that in mind, we had a mission to decouple ourselves and create a single front door into what we now refer to as our “Risk Assessment” domain.

Guiding principles

Domain Driven Design

I’ll be honest: my first instinct was to skip Domain Driven Design. This felt like a technical proxy, not a complex business domain, and I’ve seen the ‘hammer and nail’ problem add needless complexity to projects before.

That skepticism faded as we began discussing the system’s future. It became clear that to support existing and upcoming features cleanly, we needed a well-structured domain.

So, I quickly changed my mind. Before touching a single line of code, we held multiple sessions to model our domain and define a ubiquitous language. This not only aligned the team on the design but gave us an invaluable resource for onboarding new members.

We also gained another unexpected benefit - our staff engineer used the documentation we created (plus AI) to generate boilerplate code for aggregates, entities, and value objects. While not perfect, it saved time and highlighted the value of solid domain documentation.

🏗️ Ports & Adapters

“Oh la la, more buzzwords” you might say. And you’d be half-right.

We traditionally use Clean or Onion architecture. Conceptually, Ports & Adapters (Hexagonal Architecture) aims for the same end goal: inverting dependencies. The difference is emphasis - Clean Architecture is layer-oriented, Ports & Adapters is boundary-oriented.

And boundaries were exactly what we needed to fix.

Our “smart consumer” issue wasn’t that their logic leaked into our domain - it was that our core orchestration logic ended up living inside their codebase.

Think of it this way: we were selling a “flat-pack” furniture kit 📦. We handed out all the individual parts (Fraud, Credit, etc.) plus a complex instruction manual (the orchestration rules). The consumer had to assemble the finished product themselves just to get an assessment.

This was the core boundary misalignment. We had pushed our internal orchestration concerns outward.

Adopting Ports & Adapters was the decision to stop selling flat-packs and start building a factory 🏭. We created a single “front door” (our Inbound Port) where the consumer places an order, and inside our application we now handle all the assembly and orchestration.

The consumer becomes an Inbound Adapter. They don’t need to know how the furniture is built - they just plug into our Port and receive result. This mental model was the architectural enforcement needed to bring our logic back inside our domain.

I won’t go too deeply into my understanding of Ports & Adapters here, but I highly recommend reading 📖 “Hexagonal Architecture Explained” by Alistair Cockburn and Juan Manuel Garrido de Paz.

👁️ Traceability

One thing we felt our previous setup lacked was consistent traceability across services. If we needed to understand the details of a specific assessment, it required stitching together information from multiple places. The consumer held the overall outcome, but detailed insights had to be retrieved from each service individually.

Given the friction this caused for support and debugging, it was important that our new domain captured the entire journey of an assessment. That’s why our RiskAssessment aggregate was designed to contain a complete, immutable record of every executed check and its result - providing a self-contained audit trail for every decision.

💰 The Trade-off: Cost vs. Speed

A big question was: with full control over orchestration, should we run all risk checks in parallel to minimise latency?

In a vacuum, of course you would. But real systems have constraints. Our risk pipeline uses a variety of APIs (internal and external), some of which incur costs. If a free, low-effort internal check can immediately determine that an order should not proceed, it doesn’t make sense to trigger higher-cost checks prematurely.

For this reason, we chose to keep orchestration sequential - at least initially. We run cheaper checks first and fail fast when appropriate, accepting some additional latency in return for reduced operational cost.

⏱️ Handling time

A Risk Assessment is a historical fact. If we later review a decision made last year, we want to know exactly why the system arrived at that outcome.

However, customers, configurations, and rules evolve over time. Relying on live configuration to explain a historical result can lead to inaccurate conclusions.

To avoid this, we applied a Snapshot Pattern. When we run an assessment, we snapshot the relevant rules and store them inside the aggregate along with the decision. When we later reload the aggregate or inspect the persisted state, we see the precise configuration that was used at that moment in time.

Stateful response

It was tempting to have our new orchestrator return a simple boolean: accepted or declined. But our domain isn’t that binary. Some checks are not “hard stops” but conditional: maybe, but only if additional steps are taken.

So we designed the contract around a richer, stateful response model. The orchestrator can return APPROVED, DECLINED, or meaningful intermediate states like ACTION_REQUIRED. This lets the domain communicate clear next steps such as “ask the user to confirm their email” or “perform an additional authorisation step.”

🚀 Ready for launch?

We had our principles. We had our domain model. We had the blueprint. On paper, everything looked great.

But there was still a challenge: replacing the engine of a car while it’s moving. We couldn’t simply switch off the existing system and cut over to the new one. The flow is both sensitive and high-value - transitioning incorrectly wasn’t an option.

In the next post, I’ll walk through how we migrated live traffic with minimal downtime and the safeguards we put in place to make it safe.