Detect application architecture drift early in the SDLC

Eldan Ben Haim

Chief Architecture Officer

Published July 4 2022 · 4 min. read

Specifications Describe What We Want To Develop

The development of cloud-native applications involves multiple individuals working towards the same goal – continuously releasing new features. To make sure work is coordinated among the different parties, the expected outcome of their work is defined in what we’ll refer to here as a “specification”. We’re not inventing anything new: these are the all-familiar product spec, functional spec and product backlog.

Many of these specifications are of a narrow scope, in the sense that work based on these specifications occurs mostly in a well determined time-frame. Consider a feature request, for example. The feature is specified, then developed and, hopefully, tested. For many features, in many organizations, this will be the last time the feature request will ever be considered. The end result, depending on the QA process, is more-or-less compliant with the specification and will pretty much remain that way over time.

However, there are other types of specifications. Specifically, an architecture is a set of specifications that are continuously applicable throughout the implementation of a system. Any change to the application should implicitly adhere to the set of specifications that comprise its architecture. This means that as opposed to a feature request, the degree to which an application implementation is compliant with architecture is constantly changing.

Cloud Native Application Architecture Drift and Why It Exists

Specifications with a broad scope, such as architecture, are where we encounter drift. Drift is the phenomenon of an ever accumulating delta between a specification and the deliverable(s) that are created based on it. The reason drift is mostly encountered around broad scope specifications is simply because over time more and more development work is done that is subject to the specification, and with each unit of work there’s an increased chance of breaking the specification.

Some specifications may be expressed in a machine-readable notation. For example in TDD it’s common to have code that describes (and evaluates) the expected outcome of an implemented feature. Other specifications are traditionally expressed in human language, or diagrams or other forms not directly comprehensible by machines. And of course there are hybrid approaches, where the specification or some parts of it are machine-readable to some extent. Obviously, the more a specification can be interpreted by a machine, the easier it gets to reduce the drift from the specification, or perhaps eliminate it altogether.

For example, let’s consider cloud infrastructure drift. The “specification” here is the definition of the infrastructure used by an application; as this changes over time, clearly infrastructure definition is of a broad scope. For manually managed cloud infrastructure, clearly the deployed infrastructure will accumulate deltas from the manual specification (often expressed as a set of tickets, or documents) over time. Even when utilizing infrastructure-as-code solutions, where the infrastructure specification is machine-readable (and actionable, actually), in the presence of click-ops practices drift may occur over time. Luckily in this case, since the specification is machine-readable, it should be fairly straightforward to evaluate the actual infrastructure against the specification and identify drift.

Architecture drift is significantly more difficult to tackle. Contemporary cloud native applications are typically developed based on agile CI/CD methodologies. Changes are fast-paced and of high volume. This means that there are lots of opportunities to deviate from broad specifications such as application architecture, security architecture, etc. And, given that typically architecture is expressed in a non-machine-readable form, it is easy to accumulate drift from vetted architecture decisions. Moreover, implementation decisions that break architectural decisions tend to be difficult to fix if not caught early enough in the SDLC.

The security architecture for an application may, for example, assert that sensitive information should always be kept in a specific data store that is encrypted. Or that any internet-facing API that exposes sensitive information must be authenticated. However with the advent of cloud-native applications, developers are often empowered to deploy new data stores and use them throughout the application. Someone, sometime, is going to make the mistake of storing sensitive information in a new datastore that they deploy as part of a new feature, or introduce a sensitive-data-exposing API without authorization.

Architecture drift can negatively impact application security in two ways. First, by not adhering to architecture specifications we lose assurance that changes are compliant with security standards and practices that are set for the project. Second, more subtly, in light of an implementation that has drifted away from the architecture, any analysis performed based on the architecture such as threat modeling, or security design review is bound to yield wrong outcomes.

Dealing With Cloud-Native Application Architecture Drift?

It would be great if we could use formal specification languages to describe complete system architectures. Alas, history teaches us that general purpose formal specification languages are not really a viable solution. Instead, we can turn to more practical approaches that leverage automation where possible to focus and optimize the process of identifying and reducing architectural drift:

We can’t have an architecture review on each and every change introduced to the application. However, by using contextual static code and text analysis Apiiro can identify those changes that have an impact on architecture (and trigger review workflows for those changes); this helps customers to create an efficient architectural review process.
While capturing the complete application architecture in a formal specification is difficult, Apiiro can implement guardrails that ensure that specific architectural invariants are preserved. Invariants such as “all APIs that expose sensitive data must be authenticated and authorized” can be captured in a machine-readable manner and then be evaluated automatically against material changes made to the system.

By connecting Apiiro to your Source Control Manager and CI/CD pipeline, you can map the application attack surface and identify changes that either break invariants or require a review of security architects. This approach can significantly reduce architectural drift and by doing so reduce risks before deploying to the cloud.

application architecture sdlc