A Primer on Platform Engineering

The technology industry, perhaps more than any other, is prone to cycles of explosive growth and crippling retraction.

August 21, 2023Platform Engineering Stressed engineer looking at laptop

The technology industry, perhaps more than any other, is prone to cycles of explosive growth and crippling retraction. This pattern has been going on since the semiconductor was created. In recent cycles as we’ve exited periods of retraction we’ve seen the emergence of new practices and technologies which have fueled more than just another bubble, they’ve fueled fundamental changes in how we design, deliver and manage software. This is the same software that powers nearly every business and government in the world. What is emerging as the next accelerant for our industry is Platform Engineering.

In this blog series we will cover the history that led up to this shift, why it is important, how we came to be involved, and what we are doing in this space.

How we got here, a TL;DR on software delivery & operations history

For much of the history of software the role of development and operations were one in the same. We didn’t have steady state services, just jobs. Those jobs were fed into a machine and a result was output. This is a gross oversimplification but this isn’t a history blog.

The first notable and significant change to this is once computers began hosting data that other computers or users could access, and this concept exploded with the arrival of the client-server model. Rapidly the patterns of usage of computing as a whole began to shift.

This presented a problem. Computers and by extension software wasn’t originally designed for this paradigm and as a result some fairly large changes happened. New languages, new abstraction layers, new tools and new job definitions arose. Collectively with the invention of the web browser this gave rise to massive, almost unheard of growth that was the .com era, and later .com bust.

After the .com bust in 2001 there was a period when tech teams within companies were outsourced or broadly deprioritized, getting budget approved was hard, hiring was hard and venture capital wasn’t readily available. This was the closest thing to a dark ages tech has ever really had.

During this period a number of now household name companies like Google, Netflix and Amazon were setting the stage for an unprecedented period of growth. Developers and Systems administrators were also heads down coming up with modern practices like Agile and DevOps to replace the broken waterfall and ITIL processes that had crippled our ability to execute effectively

This was an important inflection point. We had technology, process, organizational and macro fiscal environment changes happening at the same time. It didn’t happen overnight in most places, but by the early 2010’s we had not only turned a corner as an industry but hit another period of superlinear growth.

A problem began to emerge during this period. Similar to other prior cycles the foundational design of many systems simply wasn’t built for this type of growth or load. Everything became harder to design, deploy and manage. Docker and Microservices exacerbated the long tail problems of overstretched teams. Thankfully Kubernetes emerged to solve many of these problems.

All developers and operators were now perfectly efficient, doves flew by and church bells rang in the distance as the happily married couple went off to their honeymoon …

<record scratch>

Except the underlying problems were and still are persistent. Certainly Kubernetes, Docker, Cloud, Agile and DevOps improved on the situation, that much is undeniable, but software sprawl? Debugging failures? Finding resources? Triage across teams? Those problems in many ways are worse because the abstractions and processes we’ve created have made it harder to access, understand and collaborate on.

This is where we think Platform Engineering comes in.

Why now is the time for Platform Engineering

At its most basic level Platform Engineering can be viewed as an evolution of DevOps for the Cloud Native and AI era, with the goal of increasing the efficiency of overall technical operations. Some people would argue for Developers alone being the focus, but while we think Developers are the core audience, many other teams can and should benefit from this initiative, and should be considered key users if you chose to manage things in a product-centric fashion.

So what problems specifically does Platform Engineering aim to solve or to frame the discussion a bit differently what value does it provide to an organization? In our estimation from talking with customers and industry peers the single biggest point of value for most teams is increased efficiency coming primarily from smoother collaboration. Namely less toil finding resources, less time spent creating yet another deployment methodology or chasing down the right versions or API’s for a specific app or stack.

In an ideal world this is addressed by standardizing the tools, versions, workflows and source of truth for metadata about systems and applications used across the entire organization. Do you live in an ideal world? Almost none of us do. Here in reality we need to chunk this up into digestible sized tasks and projects.

A few areas to explore within your team to find the right starting point:

Are developers bottle-necked due to toolchain gaps or other toil-heavy tasks that isn't their core job?
Does everyone in the organization know how to find and use the right tools for each service or application?
Can you reliably reproduce environments and deployments?
Do teams feel like they can quickly get the right people involved to triage, debug and review given problems or designs?

In our findings and experience, the most underestimated time sink as organizations scale is cross-team collaboration. Countless hours, meetings and Slack threads are spent getting the right information and people aligned for specific feature releases, application reviews and troubleshooting events.

Some of this is cultural, some of this is tooling, and some of this is prioritization.

The parts that are addressable within the context of a Platform Engineering initiative center on reducing toil, eliminating metadata decay and ensuring that development teams can be their most efficient selves. Put simply, if you make it easy for teams to gain access to accurate information quickly and leverage the same patterns things will go smoother. This will not solve the cultural challenges but will reduce one of the constant points of friction between groups - the lack of understanding and inconsistent access. If team A for example does not have access to the same tool, or the same understanding of the data within that tool as team B it should come as no surprise that those teams will need to spend time aligning and reconciling during an event. This is time lost.

In many companies these time sinks are not adequately understood or tracked but once you start digging in everyone comes to a consensus on where the fire is. In our experience identifying this pain is where a Platform Engineering initiative should start, it is not a one size fits all. It often doesn't require a dedicated team to start, it may not even require new tools to start. Rather what should happen initially is the key stakeholders should align on short term objectives and outcomes and a small project with short term value can get kicked off. Especially with the constrained budgets most teams are facing, starting small is probably the right answer.

Some teams may discover the need to invest in an IDP and should you need one, we think we are building the easiest to deploy and use SaaS IDP on the planet. If you need help getting started or want to explore our Beta offering please reach out!

Otherwise stay tuned as we are just getting started and have some exciting releases and write ups coming.

Paul Lundin

Founder & CEO of Arctir