Developer Toil Is Preventable

Platform engineering defines what you should do, not what you can do.

Developer ExperiencePlatform EngineeringMural of someone screaming

Just before the Thanksgiving holiday I was looking to add a new feature to Flightdeck. The change was simple: I would enable additional user federation capabilities on our platform. This would require that my code interface with a downstream API provider, and in this case, that provider is Keycloak.

I expected this integration to be complete in about an hour, but I soon found that this would end up taking much longer than expected.

We use the Keycloak specification to generate client bindings for our backend; in this case a Kubernetes controller written in Go. We have found that tools like oapi-codegen are invaluable for all kinds of API integration needs. As long as the provider publishes a reliable specification, we can interface with it in the way that we choose.

When I attempted to build a new client from the Keycloak specification I ran up against something curious:

oapi-codegen -package keycloak -generate client .spec.yaml > client.go
oapi-codegen -package keycloak -generate types .spec.yaml > types.go
go build .
# github.com/arctir/go-keycloak
./types.go:27:2: AuthTime redeclared
        ./types.go:26:2: other declaration of AuthTime
./types.go:498:2: AuthTime redeclared
        ./types.go:497:2: other declaration of AuthTime
make: *** [Makefile:5: build] Error 1

My hope for an easy integration evaporated in an instant. Was this a bug in oapi-codegen? Something else? Either way, I really did not want to go there.

Looking at the generated code I found the following:

type AccessToken struct {
    ...
    AuthTime    *int32    `json:"authTime,omitempty"`
    AuthTime    *int64    `json:"auth_time,omitempty"`
    ...
}

The same field has been defined twice...with different types...and with similar, yet different, JSON keys. Unfortunately, I knew that this meant the specification was, at best, not great, and at worst, just plain wrong.

Indeed, upon inspection this was confirmed:

AccessToken:
  type: object
  properties:
    ...
    auth_time:
      type: integer
      format: int64
    ...
    authTime:
      type: integer
      format: int32
    ...

Now, while this is technically correct from an OpenAPI perspective (there is no standard for case style), I believe that many would argue that this constitutes poor practice. The "auth time" field is now ambiguous: Which one is correct? Is one of them deprecated? Will this change in the future with unexpected outcomes? All of these are understandable concerns, and have real potential to impact software reliability.

This specification has fields in three different cases: snake_case, camelCase, and kebab-case. Had this specification standardized on a single case, perhaps this duplication would have been caught in testing, and, perhaps too, downstream developers could avoid this type of toil.

Whether this is a bug or intentional is unclear. And, frankly, it doesn't matter much either way. We will file an issue, and work to get it resolved.

Toil Is Preventable

For me, however, this is yet another example of the totally preventable types of toil that a developer encounters on a daily basis. These continual "paper cuts" result in amazing amounts of lost productivity, and sometimes even a poor customer experience.

This is one reason why we are so bullish about Platform Engineering. For us, Platform Engineering is all about identifying and codifying best practices into your software development lifecycle.

Platform Engineering practices really shine in scenarios similar to those described above: where the code is technically viable, but, in practice, there is a lot be desired.

Platform Engineering seeks to define what you should do; not what you can do.

As patterns emerge within an organization (or, as in this case, a community), we can develop standards, frameworks, and tests to ensure that these types of speed bumps don't derail another developer.

Over time these patterns become levers for mechanical advantage. They provide the foundation for us to reliably, and continually, innovate on top of sound fundamentals.

Moving Forward

But how do you maintain this within an organization? How can you be sure that your colleagues (or even others within an open source community) don't become sidetracked by these same sorts of nagging problems?

The immediate (and somewhat obvious) answer is through better unit testing. With less than 40 lines of Go, you could easily spot and flag this defect:

package main

import (
	"fmt"
	"os"

	mapset "github.com/deckarep/golang-set/v2"
	"github.com/getkin/kin-openapi/openapi3"
	"github.com/grokify/mogo/text/stringcase"
)

func main() {
	loader := &openapi3.Loader{}
	doc, err := loader.LoadFromFile(".spec.yaml")
	if err != nil {
		panic(fmt.Sprintf("cannot load document: %e", err))
	}
	shouldError := false

	for _, schema := range doc.Components.Schemas {
		refPath := schema.RefPath()
		knownProps := mapset.NewSet[string]()
		for propertyName, _ := range schema.Value.Properties {
			snakedCase := stringcase.ToSnakeCase(propertyName)
			if !knownProps.Contains(snakedCase) {
				knownProps.Add(snakedCase)
			} else {
				fmt.Printf("%s:%s has an attribute case duplicate\n", refPath, propertyName)
			}
		}
	}
	if shouldError {
		os.Exit(1)
	}
}

Or, you could pull in a far more comprehensive OpenAPI linting tool, such as that from our friends at Speakeasy. It found this particular defect as the first hit (as well as 20 other defects that I have encountered, and that we haven't even discussed as part of this post):

Testing Alone Is Not Enough

While these tests can easily spot well-known defects, they are only meaningful for those who make use of them as part of their test suite. As a Platform Engineer, how do I ensure that these best practices are baked into every OpenAPI specification within my organization? How can I bootstrap developers with all of the tribal knowledge acquired by those who went before them?

For us, this is where the Backstage Scaffolder comes in. It is an integral part of Flightdeck, and provides an organization with the capability to, among other actions, templatize what "good" looks like. (Again, we want to capture what should be done, not what can be done.)

Organizations can easily define a new repository template, complete with all of the tests, compliance requirements, deployment pipelines, and syntactic sugar constructs baked in. Instead of a developer reinventing the wheel every time they look to innovate, why not provide the foundation for what they need to get started right away?

These templates not only provide a great point from which to launch a new greenfield project, but also provide the basis from which to evolve a code base over time. Because they are versioned in source control, renovating an existing project so that it adheres to new organizational standards becomes trivial.

With regard to the API defect identified above, it is not only conceivable (but probably even advisable) that an organization should have pre-defined templates for what an API should look like. After all, an API is the gateway for your users. Consistency, security, and reliability are paramount.

APIs, when done right, turn your product into a platform.

So, instead of adding an OpenAPI linter to your CI/CD pipeline, why not pay it forward for the other developers in your organization? Make it an organizational standard by way of a software template.

There are broad swaths of opportunity when it comes to defining these templates at the organizational level, but also industry-wide. We are excited about the impact that this can have on developer productivity writ large.

Let's all help developers be more efficient.