<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Phil Calçado</title>
    <description></description>
    <link>http://philcalcado.com/</link>
    <atom:link href="http://philcalcado.com/feed.xml" rel="self" type="application/rss+xml" />
    <pubDate>Thu, 20 Mar 2025 15:21:45 +0000</pubDate>
    <lastBuildDate>Thu, 20 Mar 2025 15:21:45 +0000</lastBuildDate>
    <generator>Jekyll v4.3.4</generator>
    
      <item>
        <title>Building AI Products—Part II: Task-Oriented vs. Component-Oriented Pipelines</title>
        <description>&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/top.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A while back, I kicked off a three-part series on how we built an AI-powered Chief of Staff for engineering leaders—a product that scaled to 10,000 users and later became the foundation of &lt;a href=&quot;https://outropy.ai/&quot;&gt;Outropy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://philcalcado.com/2024/12/14/building-ai-products-part-i.html&quot;&gt;The first article covered the back-end architecture powering our AI agents&lt;/a&gt;. In this second installment, let’s explore inference pipelines—the beating heart of any AI system, agentic or not—through the lens of hard-won engineering lessons.&lt;/p&gt;

&lt;p&gt;Next time, I’ll discuss how we built agents that leverage these pipelines internally and coordinate with each other to power our application.&lt;/p&gt;

&lt;h2 id=&quot;are-inference-pipelines-just-rag&quot;&gt;Are Inference Pipelines Just RAG?&lt;/h2&gt;

&lt;p&gt;If you’ve been reading about AI agents lately, you’ve probably noticed the overwhelming hype. Discussions often treat “agents” as if they’re magical, glossing over how they actually work. The reality is that agents—and any AI-powered system—perform tasks using inference pipelines.&lt;/p&gt;

&lt;p&gt;At its core, an inference pipeline takes raw input—such as a user query or data—and transforms it into meaningful output based on user and developer-defined instructions and constraints. It’s the engine behind AI systems, chaining multiple steps together to produce the final output.&lt;/p&gt;

&lt;p&gt;Technically, most inference pipelines fall under &lt;a href=&quot;https://en.wikipedia.org/wiki/Retrieval-augmented_generation&quot;&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; since they involve retrieving and incorporating relevant context before generating a response. But in practice, the term “RAG” has come to mean something much narrower—typically document-based Q&amp;amp;A chatbots that pull text snippets from a database and feed them into an LLM.&lt;/p&gt;

&lt;p&gt;Inference pipelines go far beyond that. They power multi-step reasoning, task orchestration, entity extraction, workflow automation, and decision-making systems—capabilities that extend well beyond retrieving and summarizing documents.&lt;/p&gt;

&lt;p&gt;Because inference pipelines cover a much broader range of AI workflows, I won’t be using the term RAG throughout this article. The goal is to highlight the full scope of what these pipelines can do without reinforcing the misconception that they’re limited to document retrieval.&lt;/p&gt;

&lt;h2 id=&quot;a-recap-of-the-app&quot;&gt;A Recap of the App&lt;/h2&gt;

&lt;p&gt;I’ve always described our assistant as &lt;em&gt;The VSCode for Everything Else&lt;/em&gt; because my vision was to bring the power of an IDE—like VSCode or JetBrains’ tools—to the non-coding parts of your job.&lt;/p&gt;

&lt;p&gt;With a good IDE, I can open a Git repository I’ve never seen before and, within a few keystrokes, find where something is used, what it depends on, what might break if I change it, who wrote it, and when. That level of instant understanding makes coding faster and more efficient.&lt;/p&gt;

&lt;p&gt;I wanted that same kind of awareness for everything &lt;em&gt;outside&lt;/em&gt; of code. Whether I was preparing for a meeting, reading an RFC, or making a project decision, I wanted an assistant that could surface relevant context just as easily.&lt;/p&gt;

&lt;p&gt;Our assistant was a collection of AI agents that pulled information from tools like Jira, Slack, GitHub, and Google Workspace. It learned about you, your priorities, and your projects to deliver timely, relevant insights. When it detected something you’d likely want to see, it proactively surfaced it. You could also summon relevant context on demand with &lt;em&gt;Cmd+Shift+O&lt;/em&gt;, based on whatever was on your screen.&lt;/p&gt;

&lt;p&gt;Here’s a slide straight from our original pitch deck:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pitchdeck.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As you’d expect, integrating with countless APIs—each with its own auth, quotas, and rate limits—was a massive effort. If I were doing it today, I’d probably use a service that unifies these APIs. But surprisingly, integration wasn’t the hardest part. The real challenge was making the data useful.&lt;/p&gt;

&lt;h2 id=&quot;one-small-feature-for-users-one-giant-pipeline-for-engineering&quot;&gt;One Small Feature for Users, One Giant Pipeline for Engineering&lt;/h2&gt;

&lt;p&gt;The first feature, &lt;em&gt;Personal Daily Briefing&lt;/em&gt;, launched as a daily summary that appeared when you first became active on Slack. It surfaced the three most important and time-sensitive topics for you. It looked like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pdb-full.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Our top priority was getting the product into users’ hands as quickly as possible. As part of our go-to-market strategy, we partnered with Slack communities for software engineering leaders. These groups provided a great test bed since their members—engineering managers, directors, and VPs—were not only our ideal customers but also deeply engaged in discussions about work.&lt;/p&gt;

&lt;p&gt;We built the first version of this feature just two months after ChatGPT’s release, back when the entire industry was still figuring out Generative AI. That period taught us a lot about both the possibilities and limitations of these models. Some challenges were unique to that first wave of commercial models, but most still exist today. I want to share some of the most interesting ones.&lt;/p&gt;

&lt;p&gt;In our original approach, content from Slack (and later, other tools) was stored in a Postgres database by worker processes, as described in Part I. From there, it was a two-step process:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Find Relevant Messages
    &lt;ul&gt;
      &lt;li&gt;Identify the Slack channels each user belongs to.&lt;/li&gt;
      &lt;li&gt;Fetch all messages from those channels in the past 24 hours.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Summarize Messages
    &lt;ul&gt;
      &lt;li&gt;Create a list of channel:messages pairs.&lt;/li&gt;
      &lt;li&gt;Send the list to ChatGPT and ask it to identify and summarize the three most important stories.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We bundled these two steps into one piece of code and made one single call to the LLM:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pipe0.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This basic approach was enough to generate that &lt;em&gt;wow&lt;/em&gt; factor we all experienced when first using AI tools in 2022–2023. But, in a classic AI story, our naïve implementation didn’t survive real-world complications.&lt;/p&gt;

&lt;p&gt;The first problem was &lt;strong&gt;context window size&lt;/strong&gt;. Back in 2023, models had small context windows—around 4,000 tokens. Even today, with much larger windows, performance still degrades when too much irrelevant information is included in a prompt. More noise leads to worse results, no matter the model size.&lt;/p&gt;

&lt;p&gt;Then there was the issue of &lt;strong&gt;writing style&lt;/strong&gt;. Users reacted badly to briefings that referred to them in the third person or framed their own actions as if they were external events. We had to personalize the briefings—at least enough that they felt like they were written &lt;em&gt;for&lt;/em&gt; the user.&lt;/p&gt;

&lt;p&gt;That led us to the next challenge: &lt;strong&gt;relevance&lt;/strong&gt;. Different users care about different things, and those interests evolve. Just because a project is on fire doesn’t mean everyone cares—especially if highlighting it means pushing out something more relevant. We needed a way to rank stories based on each user’s interests.&lt;/p&gt;

&lt;p&gt;But by far, the biggest problem was &lt;strong&gt;duplicate summaries&lt;/strong&gt;. Slack discussions often happen across multiple channels, meaning our system needed to recognize and merge duplicate topics instead of treating them as separate events.&lt;/p&gt;

&lt;p&gt;To solve this, we started by analyzing the Slack channels each user belonged to and tracking the topics discussed. We boosted the relevance of topics a user had interacted with—whether through messages or reactions—to create a ranked list of subjects they cared about. This prioritized list was stored in Postgres and updated using an exponential decay algorithm to keep interests fresh.&lt;/p&gt;

&lt;p&gt;Now that we had a way to prioritize content, we had to make the summaries more structured. The improved flow worked like this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Summarize Discussions in Each Channel&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Identify the Slack channels each user belongs to.&lt;/li&gt;
  &lt;li&gt;For each channel, summarize all discussions from the last 24 hours.&lt;/li&gt;
  &lt;li&gt;Identify which topics each discussion belongs to.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Consolidate Summaries Across Channels&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Send the list of all discussed topics to ChatGPT.&lt;/li&gt;
  &lt;li&gt;Ask it to consolidate and deduplicate similar topics.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Rank Summaries for the User&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Fetch the list of topics the user currently cares about.&lt;/li&gt;
  &lt;li&gt;Send this list with consolidated summaries to ChatGPT.&lt;/li&gt;
  &lt;li&gt;Ask it to choose the three most relevant summaries based on user preferences.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Generate a Personalized Summary&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Take the three selected summaries and user-specific information.&lt;/li&gt;
  &lt;li&gt;Ask ChatGPT to generate a briefing tailored to the user’s perspective.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our system started looking more like an actual pipeline:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pipe1.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;These new steps added complexity. In Part I, we tackled infrastructure challenges, but chaining LLM calls created a new kind of failure. Models don’t just retrieve data—they generate it. This means every response is a decision point, and even a small inaccuracy—whether from a flawed assumption or outright hallucination—can be treated as fact by the next step, compounding errors and making the final output unreliable.&lt;/p&gt;

&lt;p&gt;Worse, LLMs are highly sensitive to input variations. Even a minor model upgrade or a slight shift in data formatting could cause misinterpretations that snowballed into serious distortions.&lt;/p&gt;

&lt;p&gt;We saw plenty of weird, sometimes hilarious failures. One engineer casually mentioned in Slack that they “might be out with the flu tomorrow.” The importance detection stage flagged it correctly. But by the time the contextualization stage processed it, the system had somehow linked it to COVID-19 protocols. The final daily briefing then advised their manager to enforce social distancing measures—despite the team being 100% remote.&lt;/p&gt;

&lt;p&gt;By the time an issue surfaced in the final output, tracing it back to the original mistake meant digging through layers of model interactions, intermediate outputs, and cached results. To prevent this, we quickly added a guardrails stage to catch nonsense before it reached the user.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pipe2.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This worked well for a while, preventing us from generating the kind of nonsensical responses that erode trust in AI tools. However, our design had a major flaw: once an error was detected, the only options were to rerun the pipeline or escalate to human intervention.&lt;/p&gt;

&lt;p&gt;As we added more users and expanded beyond Slack to tools like Figma, GitHub, and Jira, the pipeline became increasingly complex. Now, it wasn’t just about deduplicating summaries across Slack channels—we also had to recognize when the same topic was being discussed across different platforms. This required &lt;strong&gt;entity extraction&lt;/strong&gt; to identify projects, teams, and key entities, enabling us to connect discussions across multiple systems.&lt;/p&gt;

&lt;p&gt;These enhancements exacerbated the cascading error problem. It reached a point where most briefings were rejected by our guardrails and required manual review. To fix this, we added more substages to the pipeline, introducing auto-correction and validation at multiple points. Of course, this only increased the pipeline’s complexity further, and by the end, it looked something like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pip3.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Detailing each step is beyond the scope of this article. In short, we independently arrived at techniques similar to what are now known as &lt;a href=&quot;https://arxiv.org/abs/2401.15884?source=post_page-----5e40467099f8&quot;&gt;Corrective Retrieval-Augmented Generation (CRAG)&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/2402.03367&quot;&gt;RAG-Fusion&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;our-monolithic-inheritance&quot;&gt;Our Monolithic Inheritance&lt;/h2&gt;

&lt;p&gt;Before diving into how our approach to inference pipelines evolved, I want to take a step back and look at why AI engineering ended up the way it is.&lt;/p&gt;

&lt;p&gt;Much of what we know about building Generative AI applications comes from earlier data science practices. Many of yesterday’s data scientists have rebranded as today’s AI engineers, and modern AI systems still rely on many of the same tools and methodologies as traditional ML. Along with that, we’ve inherited the same well-documented problems that have plagued data pipelines for decades—monolithic, single-use artifacts that &lt;a href=&quot;https://martinfowler.com/articles/data-monolith-to-mesh.html&quot;&gt;experts like Zhamak Dehghani and Danilo Sato have extensively discussed&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/dm.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Their proposed solution is Data Mesh, a distributed data architecture rooted in classic software engineering principles like Domain-Driven Design. Instead of large, single-purpose pipelines that shuttle data between lakes, databases, and applications, it promotes smaller, self-contained data products that are discoverable, reusable, and composable, enabling more scalable and maintainable data workflows.&lt;/p&gt;

&lt;p&gt;Unfortunately, despite the hype around Data Mesh, most data teams haven’t adopted it. They still build monolithic, one-off pipelines, riddled with tech debt and brittle integrations, making data hard to find and needlessly duplicated. This is the world where today’s AI engineers learned to build pipelines—back when they still called themselves data scientists.&lt;/p&gt;

&lt;p&gt;As frustrating as this has always been in data science and analytics, the pain was manageable because these teams typically worked on supporting systems—spam classification, content recommendations, or analytics—where failures had fallback strategies and didn’t directly affect core product functionality.&lt;/p&gt;

&lt;p&gt;But now, AI is in the critical path of our applications. Technical debt in data pipelines doesn’t just slow things down—it directly impacts product quality, user experience, and business value. &lt;strong&gt;If AI is the product, it has to be built with the same rigor as any other mission-critical system.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;introducing-click-to-context&quot;&gt;Introducing Click-to-Context&lt;/h2&gt;

&lt;p&gt;Back to our product, after the success of the daily briefing, we rushed to launch our second feature: click-to-context. We added a button in Slack that let users right-click any message to get an explainer tailored to its context. This way, they could quickly catch up on conversations after returning from a meeting to a wall of unread messages.&lt;/p&gt;

&lt;p&gt;Here’s a video of what it looked like:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/ctc.gif&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;People loved the new feature, and we were adding hundreds of new users every week. One of the coolest moments was discovering an emergent use case in our telemetry: teams working in multiple languages found click-to-context invaluable. Rather than translating messages individually, they could instantly understand the overall conversation.&lt;/p&gt;

&lt;p&gt;The logic was very similar to the daily briefing pipeline. The key difference was that instead of pulling messages from multiple channels over a set time frame, it only retrieved messages from the same channel. The selection logic was simple: fetch every message sent within one hour of the right-clicked message.&lt;/p&gt;

&lt;p&gt;Behind the scenes, we had to make trade-offs to launch quickly, prioritizing speed over long-term maintainability.&lt;/p&gt;

&lt;h2 id=&quot;from-monolith-to-copypasta&quot;&gt;From Monolith to Copypasta&lt;/h2&gt;

&lt;p&gt;When we started working on Click-to-Context, our application code called our pipeline like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/single-purpose-pipeline1.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To build the new feature, we reused the core logic from the latter half of the daily briefing pipeline and simply added a few components to handle different inputs:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pipe-shared.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I added shaded regions in the diagram above to highlight an important detail: there was no real encapsulation. The only difference between these pipelines was the entry point, but they shared the same components.&lt;/p&gt;

&lt;p&gt;This is where things got complicated. While the daily briefing and click-to-context pipelines had a lot in common, they weren’t exactly the same. The prompts, instructions, and few-shot examples we built for the daily briefing were designed to handle multiple discussions at once, while click-to-context needed to focus on a single discussion and filter out unrelated messages.&lt;/p&gt;

&lt;p&gt;To ship the feature quickly, we added context-aware branching inside our components. Each function checked which feature it was serving (via a Context object, similar to Go’s approach, containing metadata like the invoking user, feature type, etc.), then branched into the appropriate logic. This was a nasty hack. Before long, we found ourselves in an increasingly tangled mess of if-else statements, a classic case of control coupling.&lt;/p&gt;

&lt;p&gt;There was no time for a proper fix, so we made a trade-off: copy-pasting code for each new pipeline to avoid untangling the logic immediately. Eventually, we ended up with this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/single-purpose-pipeline2.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We knew we were taking on tech debt to move fast, but it piled up much faster than expected. Maintaining it was painful. Slightly different versions of the same hundred-line methods were scattered everywhere. Even small changes, like switching from OpenAI’s APIs to Azure’s, turned into multi-week exercises in find-and-replace.&lt;/p&gt;

&lt;p&gt;The good news was that we were much more comfortable with the concepts, tools, and techniques of building AI by this point. We were ready to start making our architecture more sustainable. But this raised the real question: &lt;strong&gt;what does good architecture even look like for a Generative AI system?&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;component-oriented-design&quot;&gt;Component-Oriented Design&lt;/h2&gt;

&lt;p&gt;Our first attempt at bringing order to our pipelines was to clean up the copy-and-paste madness and consolidate duplicated code into reusable components with well-defined interfaces. One major issue became obvious: our components were mixing concerns.&lt;/p&gt;

&lt;p&gt;A component that fetched Slack messages also calculated social proximity scores based on interaction frequency, reaction counts, and direct message volume. We split these into a pure data-fetching service and a separate ranking algorithm. This separation made our components more reusable, reduced unexpected dependencies, and lowered the risk of cascading failures when making changes.&lt;/p&gt;

&lt;p&gt;With this shift, we started treating our pipelines as assemblies of modular components rather than static workflows.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/pipe-assembly.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This approach improved code organization, testability, and reusability. Now, instead of rewriting logic for each new pipeline, we could assemble workflows from existing, well-scoped building blocks.&lt;/p&gt;

&lt;p&gt;But the core problem remained—we had solved mixed responsibilities at the component level, yet our pipelines themselves were still tangled with multiple concerns. Each pipeline had to orchestrate:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Data retrieval from different sources (GitHub, Google Calendar, Slack)&lt;/li&gt;
  &lt;li&gt;Error handling and API inconsistencies&lt;/li&gt;
  &lt;li&gt;Processing logic (summarization, ranking, filtering)&lt;/li&gt;
  &lt;li&gt;Context-aware adaptations (personalization, deduplication, formatting)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem wasn’t just messy internals—it was structural. Pipelines weren’t truly independent; they still carried hardcoded assumptions about data shape, execution order, and failure modes. The complexity had just moved up a level.&lt;/p&gt;

&lt;p&gt;This is exactly how frameworks like LangChain and LlamaIndex operate today—frameworks we never used but independently arrived at similar patterns. Their main benefit is providing pre-built conveniences that speed up implementation, allowing engineers to assemble pipelines by chaining components together. But they do little to solve the real challenges of inference pipeline design. This bottom-up approach ultimately produces brittle, single-purpose pipelines that don’t scale beyond their initial use case.&lt;/p&gt;

&lt;h2 id=&quot;introducing-companion&quot;&gt;Introducing Companion&lt;/h2&gt;

&lt;p&gt;The Component-based approach above was our main approach through our beta launch in September 2023 and hit 10,000 users. Adding new features was our focus, not iterating over old ones. So, once a pipeline was in place, we rarely revisited it.&lt;/p&gt;

&lt;p&gt;After a year of vaporware announcements, Salesforce finally released Slack AI in February 2024. While its features were basic compared to ours, its sheer market presence made one thing clear: we couldn’t remain a Slack-only tool.&lt;/p&gt;

&lt;p&gt;That’s when we launched &lt;em&gt;Companion&lt;/em&gt;, our always-present Google Chrome extension. It offered many of the same features as our Slack integrations—Click-to-Context now worked everywhere as you select some text in any web page and click the Outropy button to bring up context—and a lot of new additional capabilities, like bringing context based on whatever was displayed on your screen and helping with meeting preparation and, the most favorite feature of all time, calendar refactoring.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/companion.gif&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;On the engineering side, Companion wasn’t just a UI change—it forced us to rethink how our inference pipelines worked. Moving from request/response interactions in Slack to a persistent, agentic system meant refactoring our pipelines into reusable, object-like agents. Once we had that foundation, we could start designing the next generation of inference pipelines.&lt;/p&gt;

&lt;h2 id=&quot;task-oriented-design&quot;&gt;Task-Oriented Design&lt;/h2&gt;

&lt;p&gt;Software engineering is about breaking big, unsolvable problems into smaller, solvable ones, where each piece contributes to solving the larger issue. Instead of tackling complex, tangled problems with equally tangled pipelines, we needed a different approach: decomposing inference into smaller, meaningful units.&lt;/p&gt;

&lt;p&gt;In his seminal book &lt;a href=&quot;https://amzn.to/3DJHmt8&quot;&gt;Working Effectively with Legacy Code&lt;/a&gt;, Michael Feathers introduces the concept of a seam:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A place where you can vary behavior in a software system without editing in that place. For instance, a call to a polymorphic function on an object is a seam because you can subclass the class of the object and have it behave differently.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A seam is ultimately a leverage point in an architecture. To scale, we needed to identify and exploit the seams in ours.&lt;/p&gt;

&lt;p&gt;Our breakthrough came when we stopped thinking bottom-up and instead mapped our pipelines from the top down. Looking at an example pipeline we discussed earlier:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Summarize Discussions in Each Channel&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Identify the Slack channels each user belongs to.&lt;/li&gt;
  &lt;li&gt;For each channel, summarize all discussions from the last 24 hours.&lt;/li&gt;
  &lt;li&gt;Identify which topics each discussion belongs to.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Consolidate Summaries Across Channels&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Send the list of all discussed topics to ChatGPT.&lt;/li&gt;
  &lt;li&gt;Ask it to consolidate and deduplicate similar topics.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Rank Summaries for the User&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Fetch the list of topics the user currently cares about.&lt;/li&gt;
  &lt;li&gt;Send this list with consolidated summaries to ChatGPT.&lt;/li&gt;
  &lt;li&gt;Ask it to choose the three most relevant summaries based on user preferences.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
  &lt;li&gt;Generate a Personalized Summary&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
  &lt;li&gt;Take the three selected summaries and user-specific information.&lt;/li&gt;
  &lt;li&gt;Ask ChatGPT to generate a briefing tailored to the user’s perspective.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We were immediately drawn to these big verbs—Summarize, Consolidate, Rank, Generate. These weren’t just individual steps in a single workflow—they were standalone, reusable tasks. Everything else was just implementation detail.&lt;/p&gt;

&lt;p&gt;Instead of treating pipelines as rigid, single-purpose workflows, we recognized an opportunity: break these tasks into separate, self-contained pipelines that could be composed and reused across different workflows.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/task-pipes.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Unlike component-oriented pipelines, where stages within a pipeline directly depend on each other, these task-oriented pipelines are self-contained. Each one takes a specific input, produces a specific output, and makes no assumptions about where the input comes from or how the output will be used.&lt;/p&gt;

&lt;p&gt;This was key: by decoupling these tasks, we not only improved maintainability but also unlocked reuse across multiple AI workflows. A pipeline that summarized discussions from Slack could just as easily summarize discussions from GitHub code reviews or comments on Google Docs—without modification.&lt;/p&gt;

&lt;p&gt;In fact, this diagram oversimplifies things. If you recall our system architecture from Part I, inference pipelines weren’t standalone—they were part of agents.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/arch-agents-worker.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This means that instead of pipelines calling each other directly, agents orchestrated the process, chaining together small inference pipelines dynamically to accomplish a goal.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/agent-pipes.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;With this shift, task-oriented pipelines became the foundation of our AI system. By focusing on what each stage of inference needed to accomplish rather than how it was implemented, we built a system that was modular, composable, and adaptable to new workflows.&lt;/p&gt;

&lt;p&gt;We used the task-oriented approach for the entire life of our product, and it worked so well that tasks became the main unit of abstraction developers use when building systems on the &lt;a href=&quot;https://outropy.ai/&quot;&gt;Outropy platform&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;building-pipelines&quot;&gt;Building Pipelines&lt;/h2&gt;

&lt;p&gt;Breaking inference down into task-oriented pipelines gave us a modular, composable system, but translating these concepts into working code was another challenge entirely.&lt;/p&gt;

&lt;p&gt;Don’t let the neat diagrams fool you. The first version of this pipeline was a beast—15,000 lines of Python crammed into a single file, with code snippets bouncing between Jupyter Notebooks, where we initially built and tested everything.&lt;/p&gt;

&lt;p&gt;But as we refactored our code into something more structured, we realized that AI systems inherit all the usual maintenance headaches of traditional software—slowing development, making systems harder to debug, and introducing risks around security, latency, and resilience. On top of that, they add their own unique pains.&lt;/p&gt;

&lt;p&gt;Beyond their sensitivity to input format, AI pipelines require constant retries when results aren’t acceptable. Even minor input changes can cascade into unpredictable behavior. Keeping things working as expected meant continually adjusting pipelines, even after they were deployed.&lt;/p&gt;

&lt;p&gt;That led us to one of the most important lessons we learned: &lt;strong&gt;AI product development is mostly trial and error&lt;/strong&gt;. If your architecture slows down daily iterations, it doesn’t just make engineering harder—it directly impacts product quality and user experience.&lt;/p&gt;

&lt;p&gt;So while the changes we made—both here and in Part I—helped us build more maintainable, production-ready pipelines, we still needed a way to experiment and iterate quickly, even while the system was in production.&lt;/p&gt;

&lt;p&gt;After evaluating several solutions, we chose Temporal to implement durable workflows. At its core, Temporal is built around a simple but powerful idea: separate the parts of your system that handle business logic from those that perform side effects. This distinction is straightforward, especially for engineers familiar with Functional Programming principles.&lt;/p&gt;

&lt;p&gt;In Temporal, workflows handle business logic and are idempotent, meaning they always produce the same result, even if retried. But workflows can’t perform side effects—they can’t call OpenAI’s API or write to a database. Instead, they invoke activities, which are isolated at both the code and runtime levels.&lt;/p&gt;

&lt;p&gt;This separation lets Temporal manage retries, timeouts, and failures automatically. If an activity fails—say, due to network issues—Temporal handles the retry logic, so we didn’t have to build it ourselves.&lt;/p&gt;

&lt;p&gt;Think of Temporal as a finer-grained Service Mesh, but for workflows inside your application rather than independent services.&lt;/p&gt;

&lt;p&gt;At first, we modeled each pipeline as a single Temporal workflow.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/single-wf.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Even as our design evolved, we stuck to this model: a single workflow for the whole pipeline, with each stage and substage implemented as plain Python objects added through a basic dependency injection system we hacked together.&lt;/p&gt;

&lt;p&gt;As we started adopting the task-oriented approach discussed above, it no longer made sense to have a single workflow for the entire pipeline. The natural next step was to give each task pipeline its own Temporal workflow.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/multiple-wf.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This improved modularity but introduced a new problem: Temporal’s benefits were now isolated to individual pipelines. Communication between pipelines still relied on external coordination, losing Temporal’s built-in reliability when chaining task workflows together.&lt;/p&gt;

&lt;p&gt;Since agents were already responsible for coordinating pipelines, the best solution was to make the agent itself a Temporal workflow. The agent would then call each task pipeline as a subworkflow, allowing Temporal to manage everything as a single transaction.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-ii/agent-wf.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This design also let the agent workflow act as a supervisor, tracking the state of the entire inference pipeline and handling errors that weren’t automatically resolved by Temporal. The result was a fully managed execution flow that not only recovered from failures but allowed us to iterate rapidly on individual tasks without disrupting the whole system—crucial for maintaining the quality of our AI experiences.&lt;/p&gt;

&lt;p&gt;This architecture proved robust enough to support our evolving product needs, including the shift from Slack-only interactions to the more proactive, context-aware Companion experience. In the final article of this series, we’ll break down how we built our agents, got them to coordinate without stepping on each other’s toes, and kept them from spiraling into unpredictable behavior.&lt;/p&gt;
</description>
        <pubDate>Fri, 14 Mar 2025 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2025/03/14/building-ai-products-part-ii.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2025/03/14/building-ai-products-part-ii.html</guid>
        
        <category>AI</category>
        
        <category>Software Engineering</category>
        
        <category>Agents</category>
        
        
      </item>
    
      <item>
        <title>AI is not the engineer we wanted, but it might be the one we need</title>
        <description>&lt;p&gt;I love making fun of thought leaders, investors, and other folks who have no idea how software is built and operated—yet keep proclaiming that the software developer is dead. Every week, there’s at least one of these sprees on LinkedIn or in newsletters, where they go on and on about how some announcement is a game-changer that just wiped out an entire industry—from the Devin fiasco to breathless declarations that the singularity has arrived after watching a hackathon project demo.&lt;/p&gt;

&lt;p&gt;As any seasoned software builder can tell you—and as we see every day when extraordinary claims fail to deliver extraordinary proof—AI still needs substantial, groundbreaking advancements to truly replace a software engineer. That said, it would be silly to ignore the fact that &lt;strong&gt;today’s AI technology is already powerful enough to disrupt huge parts of the software development&lt;/strong&gt;, namely B2B SaaS, internal tools, and other frontends-to-a-database systems.&lt;/p&gt;

&lt;p&gt;The easy narrative is that this disruption comes from the growing power and increasing capabilities of AI systems. I agree that AI is the tipping point here, but less because of its raw power and more because it’s the final missing piece in a picture we’ve been painting for two decades. Let me explain.&lt;/p&gt;

&lt;h2 id=&quot;crud-before-the-cloud&quot;&gt;CRUD Before the Cloud&lt;/h2&gt;

&lt;p&gt;Until the late ’80s, domain-specific software applications were rare outside corporate, government, and research settings. If a business was computerized—which was uncommon to begin with—it typically relied on generic spreadsheets and simple database automations.&lt;/p&gt;

&lt;p&gt;This changed in the ’90s. Walk into a computer store or flip through a magazine, and you’d find dozens of shrink-wrapped software packages catering to every niche imaginable—from video rental stores to law firms to astrology.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2025-b2bsaasai/cds.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;What’s interesting about this software is how similar it all looked. At their core, these applications were just forms and wizards for manipulating and querying a database. The user interfaces followed the same patterns—no remote clients, just local systems with barcode scanners and printers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2025-b2bsaasai/pos3.gif&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This Cambrian explosion of software was fueled by modern databases and third-generation programming languages like Visual Basic and Delphi. These tools were perfect for the era, simplifying CRUD operations on a local database and allowing developers to focus on business logic instead of low-level details. Small teams could build entire businesses by reusing the same architecture across different niches—after all, a video rental app isn’t that different from a bed-and-breakfast management system once you strip away the domain-specific details.&lt;/p&gt;

&lt;h2 id=&quot;from-boxes-to-browsers&quot;&gt;From Boxes to Browsers&lt;/h2&gt;

&lt;p&gt;As the ’90s ended, Salesforce and others began promoting the “no software” movement—a clever marketing stunt aimed at disrupting traditional CRM systems. In doing so, they helped unleash Software-as-a-Service (SaaS), which brought massive advantages for both users and software producers. SaaS quickly became the dominant model for building and distributing software, making anything else a rare exception.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2025-b2bsaasai/salesforce.jpg&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;But web development was an entirely new discipline, requiring fresh ways of thinking and writing code. Companies that had thrived in the desktop era tried to transplant their models to the web, with many failed attempts—ranging from Sun’s JavaServer Faces and Portlets to Microsoft’s Web Forms.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2025-b2bsaasai/vs.jpg&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ultimately, these desktop-like development experiences were relegated to corporate IT departments, where technology was seen as a cost center rather than a core business function. Every successful tech company—even those using Microsoft’s .NET—had to abandon this thinking and develop a new understanding of how to build scalable, user-facing web applications.&lt;/p&gt;

&lt;p&gt;By the late 2000s and early 2010s, building web products required solid engineering skills. Developers had to understand everything from securely authenticating users to deploying without downtime—and most of us were learning on the job.&lt;/p&gt;

&lt;p&gt;The rise of productivity-focused frameworks like Ruby on Rails reshaped web development culture, pushing developers toward maximizing code reuse. In the early Web 2.0 era, relying on third-party solutions for user authentication, payment processing, or inventory management was seen as heresy. But Rails changed the game.&lt;/p&gt;

&lt;p&gt;The Rails culture has always been about experimentation and challenging the status quo—all in the name of productivity. From the start, Rails developers eagerly delegated as much as possible to libraries and external services. And as Rails became the backbone of every cool startup in the early 2010s, this mindset didn’t just gain acceptance—it became the hallmark of a high-caliber developer.&lt;/p&gt;

&lt;p&gt;In true Startupland fashion, the growing reliance on third-party components quickly became an opportunity for new products and services. Companies like Stripe, Auth0, Twilio, SendGrid, and Algolia began offering these as APIs developers could integrate effortlessly. Meanwhile, developer-focused infrastructure providers like Render, Timescale, and Vercel made infrastructure worries obsolete—even at scale, handling millions of users and terabytes of data.&lt;/p&gt;

&lt;h2 id=&quot;there-and-back-again&quot;&gt;There and Back Again&lt;/h2&gt;

&lt;p&gt;And this brings us full circle to what application development was like in the ’90s. From the late 2010s to today, building B2B SaaS applications has largely become an exercise in gluing together open-source and SaaS components with custom business logic to orchestrate them. Application architecture has been heavily commoditized, and today’s developers are mostly selecting the right components off the shelf and wiring them together to meet business needs.&lt;/p&gt;

&lt;p&gt;Look inside any software team—from Google to a five-person startup—and you’ll see that this is exactly what most developers do all day. The hardest parts of application development are delegated to open-source, SaaS, or internal platform components, while developers focus on business logic and user experience.&lt;/p&gt;

&lt;p&gt;This pattern raises an obvious question: if application developers are mostly translating business rules into code without much engineering heavy lifting, why can’t the business expert—the one who already knows the rules—do this work directly? How do we eliminate the translator?&lt;/p&gt;

&lt;p&gt;Many have tried to do exactly this in the 90s, using everything from natural language programming to drag-and-drop tools—and it’s striking how much these past attempts resemble the “AI Agent development tools” we see today.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2025-b2bsaasai/biztalk.jpg&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Despite companies pouring incredible amounts of money into tools that promised to take the application developer out of the equation, this dream has never fully materialized for two key reasons:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Learning curve trade-offs: While these tools remove technical jargon, they introduce their own systems that must be learned and mastered. Worse yet, unlike programming languages, knowledge of one proprietary tool rarely transfers to another.&lt;/li&gt;
  &lt;li&gt;Power vs. simplicity balance: Simplification inevitably comes at a cost—these tools sacrifice power and flexibility. They excel at orchestrating existing components, but implementing anything beyond basic logic (even moderately complex conditional chains) becomes cumbersome. Users are limited to recombining prebuilt components, with little ability to create truly custom solutions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core issue was that these tools weren’t truly translating between business and code—they created an entirely different development platform that users had to learn and depend on, without offering anywhere near the power and flexibility of just writing code.&lt;/p&gt;

&lt;h2 id=&quot;ai-over-convention-over-configuration&quot;&gt;AI Over Convention Over Configuration&lt;/h2&gt;

&lt;p&gt;Generative AI has a real opportunity to reshape software development—not by replacing engineers outright, but by bridging the gap between business logic and implementation.&lt;/p&gt;

&lt;p&gt;Thought leaders on LinkedIn love to proclaim that AI will replace all engineers within the year. If we’re talking specifically about B2B SaaS applications and internal tools, I can see a version of this happening. Not because today’s AI models are competent software engineers, but because we have built a world where you no longer need to be one to create these kinds of applications.&lt;/p&gt;

&lt;p&gt;With the right set of reusable components, even a year-old AI model can functionally perform the job of a mid-level SaaS developer. The challenge is not raw capability, but finding the right way to direct these AI systems. Chatbots and AI agents that communicate in natural language feel inefficient and unnatural for structured development. AI also frequently misuses APIs, calls nonexistent functions, or confuses component versions, making it unreliable for production work.&lt;/p&gt;

&lt;p&gt;This opportunity feels a lot like what Heroku did for cloud computing—but even bigger. Instead of simplifying infrastructure through a system of conventions, AI enables application development itself to be more accessible. There are still many unanswered questions, but if I were running a developer-focused platform like Render or DigitalOcean, this would be my top priority.&lt;/p&gt;

&lt;p&gt;Over the past 15 years, we have built the technology, ecosystem, and culture that allow application engineers to delegate most complex and undifferentiated work to pre-existing components. The infrastructure is already in place. Generative AI is arriving at exactly the right moment to take advantage of it.&lt;/p&gt;
</description>
        <pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2025/02/27/ai.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2025/02/27/ai.html</guid>
        
        <category>AI</category>
        
        <category>Software Engineering</category>
        
        <category>Agents</category>
        
        <category>SaaS</category>
        
        
      </item>
    
      <item>
        <title>Building AI Products—Part I: Back-end Architecture</title>
        <description>&lt;p&gt;In 2023, we launched an AI-powered Chief of Staff for engineering leaders—an assistant that unified information across team tools and tracked critical project developments. Within a year, we attracted 10,000 users, &lt;a href=&quot;https://outropy.ai/blog/2024-04-19-outropy_vs_slack_ai/&quot;&gt;outperforming even deep-pocketed incumbents such as Salesforce and Slack AI&lt;/a&gt;. Here is an early demo:&lt;/p&gt;

&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/8mr5eZNXDlo?si=-IIK5uO5cTN9FFhi&quot; title=&quot;YouTube video player&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerpolicy=&quot;strict-origin-when-cross-origin&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;By May 2024, we realized something interesting: while our AI assistant was gaining traction, there was overwhelming demand for the technology we built to power it. Engineering leaders using the platform were reaching out non-stop to ask not about the tool but how we made our agents work so reliably at scale and be, you know, &lt;em&gt;actually useful&lt;/em&gt;. This led us to pivot to &lt;a href=&quot;https://outropy.ai/&quot;&gt;Outropy&lt;/a&gt;, a developer platform that enables software engineers to build AI products.&lt;/p&gt;

&lt;p&gt;Building with Generative AI at breakneck pace while the industry was finding its footing taught us invaluable lessons—lessons that now form the core of the Outropy platform. While LinkedIn overflows with thought leaders declaring every new research paper a “game changer,” few explain what the game actually is. This series aims to change that.&lt;/p&gt;

&lt;p&gt;This three-part series will cover:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;How we built the AI agents powering the assistant&lt;/li&gt;
  &lt;li&gt;How we constructed and operate our inference pipelines&lt;/li&gt;
  &lt;li&gt;The AI-specific tools and techniques that made it all work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This order is intentional. So much content out there fixates on choosing the best reranker or chasing the latest shiny technology, and few discuss how to build useful AI software. This is a report from the trenches, not the ivory tower.&lt;/p&gt;

&lt;h3 id=&quot;structuring-an-ai-application&quot;&gt;Structuring an AI Application&lt;/h3&gt;

&lt;p&gt;Working with AI presents exciting opportunities and unique frustrations for a team like ours, with decades of experience building applications and infrastructure.&lt;/p&gt;

&lt;p&gt;AI’s stochastic (probabilistic) nature fundamentally differs from traditional deterministic software development—but that’s only part of the story. With years of experience handling distributed systems and &lt;a href=&quot;https://nighthacks.com/jag/res/Fallacies.html&quot;&gt;their inherent uncertainties&lt;/a&gt;, we’re no strangers to unreliable components.&lt;/p&gt;

&lt;p&gt;The biggest open questions lie in structuring GenAI systems for long-term evolution and operation, moving beyond the quick-and-dirty prompt chaining that suffices for flashy demos.&lt;/p&gt;

&lt;p&gt;In my experience, there are two major types of components in a GenAI system:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Inference Pipelines:&lt;/strong&gt; A deterministic sequence of operations that transforms inputs through one or more AI models to produce a specific output. Think of RAG pipelines generating answers from documents—each step follows a fixed path despite the AI’s probabilistic nature.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Agents:&lt;/strong&gt; Autonomous software entities that maintain state while orchestrating AI models and tools to accomplish complex tasks. These agents can reason about their progress and adjust their approach across multiple steps, making them suitable for longer-running operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our journey began with &lt;a href=&quot;https://www.youtube.com/watch?v=ePFEpU5crN0&amp;amp;ab_channel=PhilCal%C3%A7ado&quot;&gt;a simple Slack bot&lt;/a&gt;. This focused approach let us explore GenAI’s possibilities and iterate quickly without getting bogged down in architectural decisions. During this period, we only used distinct inference pipelines and tied their results together manually.&lt;/p&gt;

&lt;p&gt;This approach served us well until we expanded our integrations and features. As the application grew, our inference pipelines became increasingly complex and brittle, struggling to reconcile data from different sources and formats while maintaining coherent semantics.&lt;/p&gt;

&lt;p&gt;This complexity drove us to adopt a &lt;em&gt;multi-agentic system&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;what-are-agents-really&quot;&gt;What are agents, really?&lt;/h3&gt;

&lt;p&gt;The industry has poured billions into AI agents, yet most discussions focus narrowly on &lt;a href=&quot;https://en.wikipedia.org/wiki/Robotic_process_automation&quot;&gt;RPA&lt;/a&gt;-style, no-code and low-code automation tools. Yes, frameworks like CrewAI, AutoGen, Microsoft Copilot Studio, and Salesforce’s Agentforce serve an important purpose—they give business users the same power that shell scripts give Linux admins. But just like you wouldn’t build a production system in Bash, these frameworks are just scratching the surface of what agents can be.&lt;/p&gt;

&lt;p&gt;The broader concept of agents has a rich history in academia and AI research, offering much more interesting possibilities for product development. Still, as a tiny startup on a tight deadline, rather than get lost in theoretical debates, we distilled practical traits that guided our implementation:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Semi-autonomous:&lt;/strong&gt; Functions independently with minimal supervision, making local decisions within defined boundaries.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Specialized:&lt;/strong&gt; Masters specific tasks or domains rather than attempting general-purpose intelligence.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Reactive:&lt;/strong&gt; Responds intelligently to requests and environmental changes, maintaining situational awareness.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Memory-driven:&lt;/strong&gt; Maintains and leverages both immediate context and historical information to inform decisions.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Decision-making:&lt;/strong&gt; Analyzes situations, evaluates options, and executes actions aligned with objectives.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tool-using:&lt;/strong&gt; Effectively employs various tools, systems, and APIs to accomplish tasks.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Goal-oriented:&lt;/strong&gt; Adapts behavior and strategies to achieve defined objectives while maintaining focus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While these intelligent components are powerful, we quickly learned that not everything needs to be an agent. Could we have built our Slackbot and productivity tool connectors using agents? Sure, but the traditional design patterns worked perfectly well, and our limited resources were better spent elsewhere. The same logic applied to standard business operations—user management, billing, permissions, and other commodity functions worked better with conventional architectures.&lt;/p&gt;

&lt;p&gt;This meant that we had the following &lt;a href=&quot;https://learning.oreilly.com/library/view/software-architecture-patterns/9781491971437/ch01.html#idm46407728082304&quot;&gt;layered architecture&lt;/a&gt; inside our application:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-mono.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;agents-are-not-microservices&quot;&gt;Agents are not Microservices&lt;/h3&gt;

&lt;p&gt;I’ve spent the last decade deep in microservices—from pioneering work at ThoughtWorks to helping underdogs like SoundCloud, DigitalOcean, SeatGeek, and Meetup punch above their weight. So naturally, that’s where we started with our agent architecture.&lt;/p&gt;

&lt;p&gt;Initially, we implemented agents as &lt;a href=&quot;https://martinfowler.com/eaaCatalog/serviceLayer.html&quot;&gt;a service layer&lt;/a&gt; with traditional request/response cycles:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-agents-service-layer.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;One of the biggest appeals of this architecture was that, even if we expected our application to be a monolith for a long time, it creates an easier path to extracting services as needed and benefit from &lt;em&gt;horizontal scalability&lt;/em&gt; when the time comes.&lt;/p&gt;

&lt;p&gt;Unfortunately, the more we went down the path, the more obvious it became that stateless microservices and AI agents just don’t play nice together. Microservices are all about splitting a particular feature into small units of work that need minimal context to perform the task at hand. The same traits that make agents powerful create a significant impedance mismatch with these expectations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Stateful Operation&lt;/strong&gt;: Agents must maintain rich context across interactions, including conversation history and planning states. This fundamentally conflicts with microservices’ stateless nature and complicates scaling and failover.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Non-deterministic Behavior&lt;/strong&gt;: Unlike traditional services, agents are basically state machines with unbounded states. They behave completely differently depending on context and various probabilistic responses. This breaks core assumptions about caching, testing, and debugging.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Data-Intensive with Poor Locality&lt;/strong&gt;: Agents process massive amounts of data through language models and embeddings, with poor data locality. This contradicts microservices’ efficiency principle.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Unreliable External Dependencies&lt;/strong&gt;: Heavy reliance on external APIs such as LLMs, embedding services, and tool endpoints creates complex dependency chains with unpredictable latency, reliability, and costs.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Implementation Complexity&lt;/strong&gt;: The combination of prompt engineering, planning algorithms, and tool integrations creates debugging challenges that compound with distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not only did this impedance mismatch cause a lot of pain while writing and maintaining the code, but agentic systems are so far away from the ubiquitous &lt;a href=&quot;https://12factor.net/&quot;&gt;12-factor&lt;/a&gt; model that attempting to leverage existing microservice tooling became an exercise in fitting square pegs into round holes.&lt;/p&gt;

&lt;h3 id=&quot;agents-are-more-like-objects&quot;&gt;Agents are more like objects&lt;/h3&gt;

&lt;p&gt;If microservices weren’t the right fit, another classic software engineering paradigm offered a more natural abstraction for agents: object-oriented programming.&lt;/p&gt;

&lt;p&gt;Agents naturally align with OOP principles: they maintain encapsulated state (their memory), expose methods (their tools and decision-making capabilities via inference pipelines), and communicate through message passing. &lt;a href=&quot;https://userpage.fu-berlin.de/~ram/pub/pub_jf47ht81Ht/doc_kay_oop_en&quot;&gt;This mirrors Alan Kay’s original vision&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/uml.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We’ve been in the industry long enough to remember the nightmares of distributed objects and the fever dreams of CORBA and J2EE. Yet, objects offered us a pragmatic way to quickly iterate on our product and defer the scalability question until we actually need to solve that.&lt;/p&gt;

&lt;p&gt;We evolved our agents from &lt;a href=&quot;https://martinfowler.com/bliki/EvansClassification.html&quot;&gt;stateless Services to Entities&lt;/a&gt;, giving them distinct identities and lifecycles. This meant each user or organization maintained their own persistent agent instances, managed through &lt;a href=&quot;https://philcalcado.com/2010/12/23/how_to_write_a_repository.html&quot;&gt;Repositories&lt;/a&gt; in our database.&lt;/p&gt;

&lt;p&gt;This drastically simplified our function signatures by eliminating the need to pass extensive context as arguments on every agent call. It also lets us leverage battle-tested tools like SQLAlchemy and Pydantic to build our agents, while enabling unit tests with stubs/mocks instead of complicated integration tests.&lt;/p&gt;

&lt;h3 id=&quot;implementing-agentic-memory&quot;&gt;Implementing Agentic Memory&lt;/h3&gt;

&lt;p&gt;Agents’ memories can be as simple as a single value to as complicated as keeping track of historical information since the beginning of times. In our assistant, we have both types and more.&lt;/p&gt;

&lt;p&gt;For simple, narrow-focused agents such as the “Today’s Priorities” agents had to remember nothing more than a list of high-priority things they were monitoring and eventually taking action, such as sending a notification if they weren’t happy with the progress. Others, like our “Org Chart Keeper” had to keep track of all interactions between everyone in the organizations and use that to infer reporting lines and teams people belonged to.&lt;/p&gt;

&lt;p&gt;The agents with simpler persistence needs would usually just store their data on a dedicated table using &lt;a href=&quot;https://docs.sqlalchemy.org/en/20/orm/&quot;&gt;SQLAlchemy’s ORM&lt;/a&gt;. This obviously wasn’t an option for the more complicated memory needs, so we had to apply a different model&lt;/p&gt;

&lt;p&gt;After some experimentation, we adopted &lt;a href=&quot;https://martinfowler.com/bliki/CQRS.html&quot;&gt;CQRS&lt;/a&gt; with &lt;a href=&quot;https://martinfowler.com/eaaDev/EventSourcing.html&quot;&gt;Event Sourcing&lt;/a&gt;. In essence, every state change—whether creating a meeting or updating team members—was represented as a &lt;em&gt;Command&lt;/em&gt;, a discrete event recorded chronologically—much like a database &lt;a href=&quot;https://en.wikipedia.org/wiki/Transaction_log&quot;&gt;transaction log&lt;/a&gt;. The current state of any object could then be reconstructed by replaying all its associated events in sequence.&lt;/p&gt;

&lt;p&gt;While this approach has clear benefits, replaying events solely to respond to a query is slow and cumbersome, especially when most queries focus on the current state rather than historical data. To address, CQRS suggests that we maintain a continuously updated, query-optimized representation of the data, similar to materialized views in a relational database. This ensured quick reads without sacrificing the advantages of event sourcing. We started off storing events and query models in Postgres, planning to move them to DynamoDB when we started having issues.&lt;/p&gt;

&lt;p&gt;One big challenge in this model is that only an agent knows what matters to them. For example, if a user would change cancel a scheduled meeting, which agents should care about this event? The scheduling agent for sure, but if this meeting was about a specific project you might also want the project management agent to know about it as it might impact the roadmap.&lt;/p&gt;

&lt;p&gt;Rather than building an all-knowing router to dispatch events to the right agents—risking the creation of a &lt;a href=&quot;https://en.wikipedia.org/wiki/God_object&quot;&gt;God object&lt;/a&gt;—we took inspiration from &lt;a href=&quot;https://developers.soundcloud.com/blog/building-products-at-soundcloud-part-1-dealing-with-the-monolith&quot;&gt;my experience at SoundCloud&lt;/a&gt;. There, we developed a semantic event bus enabling interested parties to publish and observe events for relevant entities:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it.  […] over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/semantic-events.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Following this model, all state-change events were posted to an event bus that agents could subscribe to. Each agent filtered out irrelevant events independently, removing the need for external systems to know what they cared about. Since we were working within a single monolith at the time, we implemented a straightforward &lt;a href=&quot;https://en.wikipedia.org/wiki/Observer_pattern&quot;&gt;Observer pattern&lt;/a&gt; using &lt;a href=&quot;https://docs.sqlalchemy.org/en/20/orm/events.html&quot;&gt;SQLAlchemy’s native event system&lt;/a&gt;, with plans to eventually migrate to &lt;a href=&quot;https://aws.amazon.com/blogs/database/dynamodb-streams-use-cases-and-design-patterns/&quot;&gt;DynamoDB Streams&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Inside our monolith, the architecture looked like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/memory1.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Managing both the ORM approach for simpler objects and CQRS for more complex needs grew increasingly cumbersome. Small refactorings or shared logic across all agents became harder than necessary. Ultimately, we decided the simplicity of ORM wasn’t worth the complexity of handling two separate persistence models. We converted all agents to the CQRS style but retained ORM for non-agentic components.&lt;/p&gt;

&lt;h3 id=&quot;handling-events-in-natural-language&quot;&gt;Handling Events in Natural Language&lt;/h3&gt;

&lt;p&gt;CQRS and its supporting tools excel with well-defined data structures. At SoundCloud, events like &lt;em&gt;UploadTrack&lt;/em&gt; or &lt;em&gt;CreateTrackComment&lt;/em&gt; were straightforward and unambiguous. AI systems, however, present a very different challenge.&lt;/p&gt;

&lt;p&gt;Most AI systems deal with the uncertainty of natural language. This makes the process of consolidating the Commands into a “materialized view” hard. For example, what events correspond to someone posting a Slack message like &lt;em&gt;“I am feeling sick and can’t come to the office tomorrow, can we reschedule the project meeting?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We started with the naive approach most agentic systems use: running every message through an inference pipeline to extract context, make decisions, and take actions via tool calling. This approach faced two problems: first, reliably doing all this work in a single pipeline is hard even with frontier models—more on this in part II. Second, we ran into the God object problem discussed earlier—our logic was spread across many agents, and no single pipeline could handle everything.&lt;/p&gt;

&lt;p&gt;One option involved sending each piece of content—Slack messages, GitHub reviews, Google Doc comments, emails, calendar event descriptions…—to every agent for processing. While this was straightforward to implement via our event bus, each agent would need to run its inference pipeline for every piece of content. This would offer all sorts of performance and cost issues due to frequent calls to LLMs and other models, especially considering that the vast majority of content wouldn’t be relevant to a particular agent.&lt;/p&gt;

&lt;p&gt;We wrestled with this problem for a while, exploring some initially promising but ultimately unsuccessful attempts at &lt;a href=&quot;https://www.mathworks.com/discovery/feature-extraction.html&quot;&gt;Feature Extraction&lt;/a&gt; using simpler ML models instead of LLMs. That said, I believe this approach can work well in constrained domains—indeed, we use it in Outropy to route requests within the platform.&lt;/p&gt;

&lt;p&gt;Our solution built on &lt;a href=&quot;https://arxiv.org/pdf/2312.06648&quot;&gt;Tong Chen’s Proposition-Based Retrieval research&lt;/a&gt;. We already &lt;a href=&quot;https://www.linkedin.com/posts/pcalcado_ai-vs-human-readability-the-unnecessary-activity-7275345861222535168-3fIB?utm_source=share&amp;amp;utm_medium=member_desktop&quot;&gt;used this approach to ingest structured content like CSV file&lt;/a&gt;s, where instead of directly embedding it into a vector database, we first use an LLM to generate natural language factoids about the content. While these factoids add no new information, their natural language format makes vector similarity search much more effective than the original spreadsheet-like structure.&lt;/p&gt;

&lt;p&gt;Our solution was to use an LLM to generate propositions for every message, structured according to a format inspired by &lt;a href=&quot;https://github.com/amrisi/amr-guidelines/blob/master/amr.md&quot;&gt;Abstract Meaning Representation&lt;/a&gt;, a technique from natural language processing.&lt;/p&gt;

&lt;p&gt;This way, if user &lt;em&gt;Bob&lt;/em&gt; sends a message like &lt;em&gt;“I am feeling sick and can’t come to the office tomorrow, can we reschedule the project meeting?”&lt;/em&gt; on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#project-lavender&lt;/code&gt; channel we would get structured propositions such as:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/amr.jpg&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Naturally, we had to carefully batch messages and discussions to minimize costs and latency. This necessity became a major driver behind developing Outropy’s automated pipeline optimization using Reinforcement Learning.&lt;/p&gt;

&lt;h3 id=&quot;scaling-to-10000-users&quot;&gt;Scaling to 10,000 Users&lt;/h3&gt;

&lt;p&gt;As mentioned a few times, Throughout this whole process, it was very important to us to minimize the amount of time and energy invested in technical topics unrelated to learning about our users and how to use AI to build products.&lt;/p&gt;

&lt;p&gt;We kept our assistant as a single component, with a single code base and a single container image that we deployed using AWS Elastic Container Service. Our agents were simple Python classes using SQLAlchemy and Pydantic, and we relied on FastAPI and asyncio’s excellent features to handle the load. Keeping things simple allowed us to make massive progress on the product side, to a point we went from 8 to 2,000 users in about two months.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-mono.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That’s when things started breaking down. Our personal daily briefings—our flagship feature—went from taking minutes to hours per user. We’d trained our assistant to learn each user’s login time and generate reports an hour before, ensuring fresh updates. But as we scaled, we had to abandon this personalization and batch process everything at midnight, hoping reports would be ready when users logged in.&lt;/p&gt;

&lt;p&gt;As an early startup, growth had to continue, so we needed a quick solution. We implemented organization-based sharding with a simple configuration file: smaller organizations shared a container pool, while those with thousands of users got dedicated resources. This isolation allowed us to keep scaling while maintaining performance across our user base.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-shards.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This simple change gave us breathing room by preventing larger accounts from blocking smaller ones. We also added priority processing, deprioritizing inactive users and those we learned were away from work.&lt;/p&gt;

&lt;p&gt;While sharding gave us parallelism, we quickly hit the fundamental scaling challenges of GenAI systems. Traditional microservices can scale horizontally because their external API calls are mostly for data operations. But in AI systems, these slow and unpredictable third-party API calls are your critical path. They make the core decisions, and this means everything is blocked until you get a response.&lt;/p&gt;

&lt;p&gt;Python’s async features proved invaluable here. We restructured our agent-model interactions using &lt;a href=&quot;https://refactoring.guru/design-patterns/chain-of-responsibility&quot;&gt;Chain of Responsibility&lt;/a&gt;, which let us properly separate CPU-bound and IO-bound work. Combined with some classic systems tuning—increasing container memory and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ulimit&lt;/code&gt; for more open sockets—we saw our request backlog start to plummet.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://platform.openai.com/docs/guides/rate-limits&quot;&gt;OpenAI rate limits&lt;/a&gt; became our next bottleneck. We responded with a token budgeting system that &lt;a href=&quot;https://dagster.io/glossary/data-backpressure&quot;&gt;applied backpressure&lt;/a&gt; while hardening our LLM calls with exponential backoffs, caching, and fallbacks. Moving the heaviest processing to off-peak hours gave us extra breathing room.&lt;/p&gt;

&lt;p&gt;Our final optimization on the architectural: moving from OpenAI’s APIs to Azure’s GPT deployments. The key advantage was &lt;a href=&quot;https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits&quot;&gt;Azure’s per-deployment quotas&lt;/a&gt;, unlike OpenAI’s organization-wide limits. This let us scale by load-balancing across multiple deployments. To manage the shared quota, we extracted our GPT calling code into a dedicated service rather than adding distributed locks&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-gpt-proxy.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-zero-one-infinity-rule&quot;&gt;The Zero-one-infinity rule&lt;/h3&gt;

&lt;p&gt;One of my favorite adages in computer science is &lt;a href=&quot;https://en.wikipedia.org/wiki/Zero_one_infinity_rule&quot;&gt;“There are only three numbers: zero, one, and infinity.”&lt;/a&gt; In software engineering, this manifests as having either zero modules, a monolith, or an arbitrary and always-growing number. As such, extracting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GPTProxy&lt;/code&gt; as our first remote service paved the way for similar changes.&lt;/p&gt;

&lt;p&gt;The most obvious opportunity to simplify our monolith and squeeze more performance from the system was extracting the logic that pulled data from our users’ connected productivity tools. The extraction was straightforward, except for one challenge: our event bus needed to work across services. We kept using SQLAlchemy’s event system, but replaced our simple observer loop with a proper &lt;a href=&quot;https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern&quot;&gt;pub/sub&lt;/a&gt; implementation using Postgres as a queue.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-int-worker.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This change dramatically simplified things—we should have done it from the start. It isolated a whole class of errors to a single service, making debugging easier, and let developers run only the components they were working on.&lt;/p&gt;

&lt;p&gt;Encouraged by this success, we took the next logical step: extracting our agents and inference pipelines into their own component.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/building-ai-products-i/arch-agents-worker.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is where my familiar service extraction playbook stopped working. I’ll cover the details of our inference pipelines in the next article, but first, let’s talk about how we distributed our agents.&lt;/p&gt;

&lt;h3 id=&quot;agents-as-distributed-objects&quot;&gt;Agents as Distributed Objects&lt;/h3&gt;

&lt;p&gt;As successful as we were with modeling agents as objects, we’d always been wary of distributing them. My ex-colleague &lt;a href=&quot;https://martinfowler.com/bliki/FirstLaw.html&quot;&gt;Martin Fowler’s First Law of Distributed Objects&lt;/a&gt; puts it best: &lt;strong&gt;don’t&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Still, I think that &lt;a href=&quot;https://martinfowler.com/articles/distributed-objects-microservices.html&quot;&gt;Martin’s “exception” for microservices&lt;/a&gt; applies just as well for agents:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[My objection is that] although you can encapsulate many things behind object boundaries, you can’t encapsulate the remote/in-process distinction. An in-process function call is fast and always succeeds […] Remote calls, however, are orders of magnitude slower, and there’s always a chance that the call will fail due to a failure in the remote process or the connection.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem with the distributed objects craze of the 90s was its promise that fine-grained operations—like iterating through a list of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user&lt;/code&gt; objects and setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_enabled&lt;/code&gt; to false—could work transparently across processes or servers. Microservices and agents avoid this trap by exposing coarse-grained APIs specifically designed for remote calls and error scenarios.&lt;/p&gt;

&lt;p&gt;We kept modeling our agents as objects even as we distributed them, just using &lt;a href=&quot;https://en.wikipedia.org/wiki/Data_transfer_object&quot;&gt;Data Transfer Objects&lt;/a&gt; for their APIs instead of domain model objects. This worked well since not everything needs to be an object. Inference pipelines, for instance, are a poor candidate for object orientation and benefit from different abstractions.&lt;/p&gt;

&lt;p&gt;At this stage, our system consisted of multiple instances of a few docker images on ECS. Each container exposed FastAPI HTTP endpoints, with some continuously polling our event bus.&lt;/p&gt;

&lt;p&gt;This model broke down when we added backpressure and &lt;a href=&quot;https://learn.microsoft.com/en-us/dotnet/architecture/cloud-native/application-resiliency-patterns&quot;&gt;resilience patterns&lt;/a&gt; to our agents. We faced new challenges: what happens when the third of five LLM calls fails during an agent’s decision process? Should we retry everything? Save partial results and retry just the failed call? When do we give up and error out?”&lt;/p&gt;

&lt;p&gt;Rather than build a custom orchestrator from scratch, we started exploring existing solutions to this problem.&lt;/p&gt;

&lt;p&gt;We first looked at ETL tools like Apache Airflow. While great for data engineering, Airflow’s focus on stateless, scheduled tasks wasn’t a good fit for our agents’ stateful, event-driven operations.&lt;/p&gt;

&lt;p&gt;Being in the AWS ecosystem, we looked at Lambda and other serverless options. But while serverless has evolved significantly, it’s still optimized for stateless, short-lived tasks—the opposite of what our agents need.&lt;/p&gt;

&lt;p&gt;I’d heard great things about Temporal from my previous teams at DigitalOcean. It’s built for long-running, stateful workflows, offering the durability and resilience we needed out of the box. The multi-language support was a bonus, as we didn’t want to be locked into Python for every component.&lt;/p&gt;

&lt;p&gt;After a quick experiment, we were sold. We migrated our agents to run all their computations through Temporal workflows.&lt;/p&gt;

&lt;p&gt;Temporal’s core abstractions mapped perfectly to our object-oriented agents. It splits work between side-effect-free workflows and flexible activities. We implemented our agents’ main logic as Workflows, while tool and API interactions—like AI model calls—became Activities. This structure let Temporal’s runtime handle retries, durability, and scalability automatically.&lt;/p&gt;

&lt;p&gt;The framework wasn’t perfect though. Temporal’s Python SDK felt like a second-class citizen—even using standard libraries like Pydantic was a challenge, as the framework favors data classes. We had to build quite a few converters and exception wrappers, but ultimately got everything working smoothly.&lt;/p&gt;

&lt;p&gt;Temporal Cloud was so affordable we never considered self-hosting. It just works—no complaints. For local development and builds, we use their Docker image, which is equally reliable. We were so impressed that Temporal became core to both our inference pipelines and Outropy’s evolution into a developer platform!&lt;/p&gt;

&lt;p&gt;Stay tuned for a deeper dive into Temporal and inference pipelines in the next installment of this series!&lt;/p&gt;
</description>
        <pubDate>Sat, 14 Dec 2024 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2024/12/14/building-ai-products-part-i.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2024/12/14/building-ai-products-part-i.html</guid>
        
        <category>AI</category>
        
        <category>Software Engineering</category>
        
        <category>Agents</category>
        
        
      </item>
    
      <item>
        <title>Copilotcalypse</title>
        <description>&lt;p&gt;I just saw &lt;a href=&quot;https://www.theverge.com/2024/5/8/24151847/microsoft-copilot-rewrite-prompt-feature-microsoft-365&quot;&gt;this article on The Verge&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2024-copilotcalypse/verge.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Let me start by saying that I am not a fan of chatbots as the primary user interface for enterprise systems. Although LLM-backed tools are significantly better at detecting intent and making data more useful, as a survivor of the fever dream that was &lt;em&gt;Chatops&lt;/em&gt; in the mid-2010s, I don’t see how they solve any of the fundamental issues we faced with the previous iteration. Problems with permissions, concurrency, versioning, reuse, verbosity, and error handling are all still present, and we now have to deal with fresh new challenges brought by AI systems, such as hallucinations.&lt;/p&gt;

&lt;p&gt;Even if these problems were to be solved, I believe that we tend to overestimate our desire to talk to machines. Conversational interfaces can be great when you are exploring something unfamiliar, don’t have a device handy, or when you want to ask questions. However, they are inefficient and annoying when you are on a laptop or phone and know exactly what you want, from turning off the lights to creating a pivot table.&lt;/p&gt;

&lt;p&gt;That’s why &lt;a href=&quot;https://outropy.ai&quot;&gt;Outropy&lt;/a&gt; doesn’t have a chatbot. Instead of asking users to talk to our system, Outropy offers a user experience inspired by tools such as VSCode and Intellij. These integrated environment make coders highly productive by providing timely and relevant insight and allowing users to “double-click” for contextual and actionable information about anything they see on the screen.&lt;/p&gt;

&lt;p&gt;To our dismay, Microsoft invested millions of dollars in marketing to convince the world that &lt;em&gt;copilot&lt;/em&gt; is just a synonym for &lt;em&gt;chatbot&lt;/em&gt;. That’s why we stopped referring to Outropy as &lt;em&gt;the Copilot for Engineering Leaders&lt;/em&gt; and started calling it &lt;em&gt;The VSCode for everything else&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But, back to Microsoft, it’s no secret that the adoption of Copilot for both  Windows and Office &lt;a href=&quot;https://www.businessinsider.com/microsoft-ai-assistant-copilot-early-adopters-disappointed-report-2024-2&quot;&gt;has faced issues&lt;/a&gt;. I suspect that they are facing the same issue as many AI startups: a lot of users sign up to try the product upon launch, investors get excited about massive revenue projections, but users start to churn as soon as the first or second bill arrives. It’s easy to get IT managers who need to provide some kind of “AI strategy” to their bosses to sign up for a trial, but once they’ve  played around with it for a little while it’s hard for them to justify what the extra $30/user/month buys them.&lt;/p&gt;
&lt;blockquote&gt;
  &lt;blockquote&gt;
    &lt;blockquote&gt;
      &lt;blockquote&gt;
        &lt;blockquote&gt;
          &lt;blockquote&gt;
            &lt;blockquote&gt;
              &lt;p&gt;47e3653 (copilotcalypse)&lt;/p&gt;
            &lt;/blockquote&gt;
          &lt;/blockquote&gt;
        &lt;/blockquote&gt;
      &lt;/blockquote&gt;
    &lt;/blockquote&gt;
  &lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Anyone paying close attention to the enterprise AI market could see this &lt;em&gt;copilotcalypse&lt;/em&gt; coming, but I had hoped that it would be a forcing function pushing Microsoft and other deep-pocketed companies who are haphazardly plastering ✨ buttons all over their systems to invest in some UX research to move away from chatbots.&lt;/p&gt;

&lt;p&gt;Instead, they are doubling down and adding new chatbots to help you extract value from their existing chatbots. Chatbots will keep being added until engagement metrics improve.&lt;/p&gt;

&lt;p&gt;This is disappointing, but not surprising. I have worked in tech for long enough to know that inside a large organization such as Microsoft, Google, or Meta there are so many competing goals that the safest way to keep your head on your neck is always to keep doing what the higher-ups like, even if it’s clearly not working.&lt;/p&gt;

&lt;p&gt;And this is the exact kind of opening that startups exploit to take on incumbents.&lt;/p&gt;
</description>
        <pubDate>Fri, 08 Mar 2024 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2024/03/08/copilotcalypse.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2024/03/08/copilotcalypse.html</guid>
        
        <category>Copilot</category>
        
        <category>Outropy</category>
        
        
      </item>
    
      <item>
        <title>Attention is All A Manager Needs</title>
        <description>&lt;!-- Yay, no errors, warnings, or alerts! --&gt;

&lt;p&gt;As I dive deeper into the current renaissance of Artificial Intelligence for a project I’m working on (more on that at the end), I find myself revisiting research on how to process and make sense of information at scale.&lt;/p&gt;

&lt;p&gt;Discussions about managing large-scale information naturally remind me of the challenges engineering managers and directors face. They must navigate both information overload and scarcity at the same time—a paradox that becomes especially evident when coaching new managers or those transitioning from line management to senior leadership. In this article, I’ll explore these challenges and share a few practical tools that have helped me on my own journey.&lt;/p&gt;

&lt;h2 id=&quot;the-challenges-with-attention&quot;&gt;The challenges with attention&lt;/h2&gt;

&lt;p&gt;Managing is a constant balancing act of deciding where to direct your attention—knowing when to zoom in on details and when to step back for the bigger picture.&lt;/p&gt;

&lt;p&gt;In modern organizations, this is made worse by the proliferation of multiple inboxes managers must keep up with. Every tool used by teams daily—Jira, Slack, Workday, Expensify, and countless others—has its own notifications and alerts. As a manager, you’re constantly sifting through them, deciphering which issues demand immediate attention and which can wait.&lt;/p&gt;

&lt;p&gt;Information overload is a major pain point, consuming valuable time and draining your energy. At the same time, the opposite problem arises—critical information often fails to reach you when you need it. Team members may not proactively share key updates, leaving you in the dark about important tasks or projects unless you explicitly request a status report.&lt;/p&gt;

&lt;p&gt;These challenges manifest differently depending on your level of management. Let’s take a closer look at how they play out at different stages.&lt;/p&gt;

&lt;h2 id=&quot;engineering-managers&quot;&gt;Engineering managers&lt;/h2&gt;

&lt;p&gt;When coaching an engineer or individual contributor through the transition to management, the first challenge is usually shifting their attention from individual tasks to the team and project as a whole.&lt;/p&gt;

&lt;p&gt;This shift requires learning how to delegate and trust the team, but the hardest part is understanding &lt;em&gt;what&lt;/em&gt; needs attention and &lt;em&gt;where&lt;/em&gt; to look for it. Widening the lens can feel overwhelming, and new managers are often terrified they won’t be able to provide a competent answer when their director or an important stakeholder asks about something.&lt;/p&gt;

&lt;p&gt;How people navigate this depends a lot on their personality. More extroverted managers tend to rely on conversations with their team to gather information. While this might seem like a good approach, it can quickly turn into a stream of interruptions for status reports and a team bogged down in meetings. In remote work, the problem is compounded by the lack of quick, spontaneous conversations.&lt;/p&gt;

&lt;p&gt;On the other hand, more introverted managers often dread asking for status updates. Instead, they spend hours combing through Slack messages, Google Docs comments, and GitHub PRs rather than bothering someone for an update. This approach preserves the team’s flow but leaves the manager working with incomplete or outdated information pieced together from artifacts. It also results in misalignment—without regular sync points, team members naturally drift apart, forming micro-teams rather than functioning as a cohesive unit.&lt;/p&gt;

&lt;p&gt;And even when they &lt;em&gt;do&lt;/em&gt; have a good grasp of what’s happening, another major challenge for first-time managers is avoiding reactivity.&lt;/p&gt;

&lt;p&gt;As a manager, you’re exposed to a constant stream of activity that could impact your projects and people. New managers often fall into the trap of handling issues as they arise, dealing with each one individually. While resolving problems &lt;em&gt;feels&lt;/em&gt; productive, reacting to every fire means you’ll quickly become a troubleshooter rather than a leader. You’ll spend so much time firefighting that there’s little room left for your actual job, which is putting systems in place to prevent these issues in the first place. And even when you do have time, hours of constant context-switching will leave you mentally drained.&lt;/p&gt;

&lt;h2 id=&quot;directors-and-above&quot;&gt;Directors and above&lt;/h2&gt;

&lt;p&gt;After plenty of trial and error (and hopefully some good coaching), most new managers eventually find a system for managing information within their own team. They start feeling less overwhelmed and build processes that work well enough.&lt;/p&gt;

&lt;p&gt;Unfortunately, this is usually the moment they’re given a larger scope. Maybe they’re assigned a second or third team or promoted to senior manager or director. And that’s when they realize that keeping track of their own team was the easy part. As their responsibilities grow, so does the area their attention needs to cover.&lt;/p&gt;

&lt;p&gt;The role of a senior manager varies wildly depending on the company’s size and structure—much more than that of an engineering manager. One constant, however, is the expectation that someone in this role will impact and support more than just their immediate teams and stakeholders. A senior manager is also expected to influence the broader organization. A lot of this falls under the concept of &lt;a href=&quot;https://amzn.to/3Y2z59n&quot;&gt;what Patrick Lencioni calls “first team”&lt;/a&gt;: the idea that the leaders in your group—you, your peers, and your manager—form a team in its own right, one that prioritizes the company’s overall success rather than just advocating for individual groups.&lt;/p&gt;

&lt;p&gt;This expectation exists whether or not the company is mature enough to implement &lt;em&gt;first team&lt;/em&gt; in practice. To meet it, senior managers must stay up to speed not only on their own teams and projects but also on what’s happening across peer teams and the organization as a whole.&lt;/p&gt;

&lt;p&gt;In practice, this means your attention can’t just be directed &lt;em&gt;downward&lt;/em&gt; at your teams—you also need to look &lt;em&gt;sideways&lt;/em&gt; across the organization.&lt;/p&gt;

&lt;p&gt;With this widened aperture, keeping track of what’s happening by reading Slack threads, PRs, Google Docs, and Jira tickets becomes humanly impossible. Worse, now that you’re in a higher-ranking role, your requests for updates carry more weight. A simple Slack message asking for an update can turn into a 30-minute walkthrough of a rehearsed 40-slide deck. Before long, you become selective about when and from whom you request updates—partly to avoid unnecessary deep dives, but also because you don’t want people dropping everything just because &lt;em&gt;“the boss wants a status report pronto.”&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-has-worked-for-me&quot;&gt;What has worked for me&lt;/h2&gt;

&lt;p&gt;Like many crafts, learning to manage attention and information effectively comes mostly through experience. While every manager’s journey is unique, some practical strategies consistently help across different cases.&lt;/p&gt;

&lt;h3 id=&quot;build-recurring-information-checkpoints&quot;&gt;Build recurring information checkpoints&lt;/h3&gt;

&lt;p&gt;You may loathe the &lt;em&gt;eight-hours-per-iteration-spent-in-planning&lt;/em&gt; ordeal suggested by early versions of Scrum, and I won’t blame you for that. But regular planning sessions—for both the short and medium term—are invaluable. They give teams a chance to step back from the daily fog of war and think strategically about the next few weeks or months.&lt;/p&gt;

&lt;p&gt;Similarly, a daily stand-up provides a structured forum where everyone can broadcast what they’re working on and self-organize around blockers.&lt;/p&gt;

&lt;p&gt;It doesn’t matter how often these rituals happen or how long they take. What matters is that they create a predictable cadence for &lt;em&gt;actionable&lt;/em&gt; information sharing. First, they allow teams to plan around these sessions instead of being interrupted by a manager’s ad-hoc requests for status updates. More importantly, they set the expectation that information should be shared regularly and give team members implicit permission to ask for updates—something that might seem trivial but is crucial, especially for less extroverted folks who may need the encouragement of a structured setting.&lt;/p&gt;

&lt;h3 id=&quot;create-a-ux-for-information-sharingeven-if-you-need-to-fake-it&quot;&gt;Create a UX for information sharing—even if you need to fake it&lt;/h3&gt;
&lt;p&gt;The only thing worse than not having the information and context you need in Jira, a document, or some other system is having that information be outdated. And both problems are rampant in organizations of all sizes.&lt;/p&gt;

&lt;p&gt;You can try to fix this by enforcing diligence—making it clear that keeping information up to date is a core expectation that impacts performance. Or you can take a softer approach, reminding people that the best way to avoid interruptions is to ensure key details are always documented.&lt;/p&gt;

&lt;p&gt;But whether you use a carrot or a stick, these tactics only go so far. The real problem isn’t forgetfulness or lack of effort—it’s the user experience. And I don’t mean the UX of tools like Jira (which, let’s be honest, are universally awful but tolerable). I’m talking about the entire process of keeping information up to date across multiple systems.&lt;/p&gt;

&lt;p&gt;Typically, a team doesn’t use just one system but several, all of which need to stay in sync. People waste an absurd amount of time manually copying, pasting, and linking items between Jira, Trello, GitHub PRs, and various spreadsheets.&lt;/p&gt;

&lt;p&gt;To reduce this friction, I maintain a single spreadsheet listing all projects in our portfolio, each with an individual accountable for it. Every row includes status and planning information, the relevant goal (usually an OKR), and a link to the project’s charter. The charter itself can live anywhere—some people use a Google Doc, others an Epic in Jira—as long as it follows a standard template.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2023-attention/portfolio.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Every two weeks, we hold a project portfolio meeting with all project owners—something that deserves its own article. I expect each project owner to keep their row in the spreadsheet up to date, as this serves as the single source of truth.&lt;/p&gt;

&lt;p&gt;In return, I take on the administrative burden of ensuring this spreadsheet feeds into whatever other systems need the data—whether it’s OKR tracking tools, company roadmaps, or other reporting structures. The project owners don’t need to worry about those; they just need to keep the spreadsheet accurate.&lt;/p&gt;

&lt;p&gt;As you might imagine, this can be a lot of work. Over time, I’ve tried different approaches to avoid spending all my waking hours copying and pasting data—everything from scripts that sync data across systems to, when I’m &lt;em&gt;really&lt;/em&gt; lucky, hiring a Chief of Staff to manage the process.&lt;/p&gt;

&lt;p&gt;Regardless of the method, the key takeaway is this: the best way to improve information quality is to reduce friction in the process.&lt;/p&gt;

&lt;h3 id=&quot;have-recurring-one-on-ones-not-only-with-reports-but-also-peers&quot;&gt;Have recurring one-on-ones not only with reports but also peers&lt;/h3&gt;

&lt;p&gt;I joke that every book on engineering management is just 200 pages teaching you how to run one-on-one meetings. Joke or not, holding regular one-on-ones with your direct reports is already a well-established practice in software engineering. What’s less common, though, is having recurring one-on-ones with your peers and your manager’s peers.&lt;/p&gt;

&lt;p&gt;Just like with direct reports, the structure and cadence of these meetings vary depending on the person and evolve over time. In my experience, executives tend to prefer meetings with a clear agenda, while peers from departments you don’t interact with as often may prefer informal conversations.&lt;/p&gt;

&lt;p&gt;Regardless of format, holding these conversations regularly is crucial for building relationships. You don’t want the only time you interact with a peer to be when something has gone wrong. These meetings also provide a space to discuss events that aren’t yet time-sensitive, explore early ideas that may or may not materialize, and create opportunities for serendipity—where a passing comment unexpectedly reveals something relevant to your team.&lt;/p&gt;

&lt;p&gt;Calendaring software is terrible. Unless you have an executive assistant, managing recurring one-on-ones at different cadences quickly becomes a nightmare. You lose track of who you should be meeting with and how often, forget to remove meetings that are no longer necessary, and struggle to keep everything organized.&lt;/p&gt;

&lt;p&gt;To tackle this, I use a simple spreadsheet—a poor man’s CRM. It’s nothing fancy:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2023-attention/crm.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Besides the shared agenda for each meeting, I keep a private document for every person I meet with. Whenever something &lt;em&gt;important but not urgent&lt;/em&gt; comes up related to that person or their team, I take a screenshot, save a link, or jot down a quick note in reverse chronological order. This helps me keep track of things I need to bring up without feeling the pressure to act on them immediately.&lt;/p&gt;

&lt;h3 id=&quot;avoid-name-dropping&quot;&gt;Avoid name-dropping&lt;/h3&gt;
&lt;p&gt;When someone is new to a role or a peer group, it’s natural to feel awkward asking questions. A common defense mechanism is shifting the &lt;em&gt;blame&lt;/em&gt;—saying you need information or a task completed because &lt;em&gt;“I have to report to my manager on this soon.”&lt;/em&gt; It’s an easy way to justify the request, but it causes all sorts of problems.&lt;/p&gt;

&lt;p&gt;The biggest issue is that it invites your direct reports and peers to question your role as a leader. Are you adding value, or are you just relaying information up the chain? They wouldn’t be wrong to ask. And if you don’t have a good answer, it might be time to rethink where you’re spending your time as a manager.&lt;/p&gt;

&lt;p&gt;Even if your team understands your value, &lt;em&gt;student syndrome&lt;/em&gt; kicks in. People will delay giving you the information until right before your meeting with &lt;em&gt;the big boss&lt;/em&gt;. That leaves you scrambling to process everything at the last minute, without time to think strategically. Worse, it makes you dependent on these rushed updates instead of having a continuous understanding of what’s happening across your teams.&lt;/p&gt;

&lt;p&gt;As tempting as it is, never justify an information request by saying your boss needs it. It’s perfectly reasonable for people to ask why you need certain details. Instead of name-dropping, share the context. This shifts the conversation from a transactional status update to a strategic discussion—one where your reports or peers can help you figure out the best way to achieve the intended outcome, rather than just ticking a box.&lt;/p&gt;

&lt;p&gt;More importantly, it reinforces that you’re not just a messenger—you’re a leader who understands and owns the bigger picture.&lt;/p&gt;

&lt;h2 id=&quot;plug-where-my-attention-is-at&quot;&gt;Plug: Where my attention is at&lt;/h2&gt;
&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Update (2025):&lt;/strong&gt; When this article was published in 2023, we were building an AI-powered Chief of Staff for engineering leaders. Within a year, it attracted 10,000 users, surpassing even well-funded incumbents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;By 2024, demand shifted—not for the assistant itself, but for the technology behind it. Engineering leaders kept asking how we made our AI agents work so reliably at scale. That realization led us to pivot to a developer platform for building AI products.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Leaders spend at least two hours every day just catching up—whether it’s first thing in the morning or after hours of back-to-back meetings. We can’t talk about increasing efficiency in tech while ignoring this massive drain on time and focus.&lt;/p&gt;

&lt;p&gt;My co-founder and I have debated this problem for years. With the latest breakthroughs in Artificial Intelligence and Large Language Models, we saw an opportunity to solve it—building something that, until recently, wasn’t even possible.&lt;/p&gt;

&lt;p&gt;Our platform integrates with the tools your team already uses, learning not just from the artifacts they create but from how they interact and collaborate. This lets us build an &lt;em&gt;Interest Graph&lt;/em&gt; that delivers relevant, timely updates based on the topics and projects that matter to you.&lt;/p&gt;

&lt;p&gt;And because we understand how your tools work—and how you use them—we can automate much of the manual work that leaders spend hours on each day.&lt;/p&gt;

&lt;p&gt;It’s like GitHub Copilot, but for managers.&lt;/p&gt;

&lt;p&gt;Our first release helps both new and experienced managers cut through the noise and focus on what’s most important &lt;em&gt;to them.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We’re launching publicly this Fall and are currently onboarding partners for our private beta. If you’d like to see what we’re building, join the waitlist at &lt;a href=&quot;https://outropy.ai&quot;&gt;Outropy.ai&lt;/a&gt;. For more about the company and our vision, you can reach me at phil &amp;lt;at&amp;gt; outropy.ai.&lt;/p&gt;
</description>
        <pubDate>Fri, 21 Jul 2023 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2023/07/21/attention_is_all_a_manager_needs.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2023/07/21/attention_is_all_a_manager_needs.html</guid>
        
        <category>Engineering Management</category>
        
        <category>Outropy</category>
        
        
      </item>
    
      <item>
        <title>Five takeaways from looking for a new senior role in tech</title>
        <description>&lt;p&gt;A few months ago, I left SeatGeek without much of a plan of what to do next. My green card was finally issued in 2021, which means that I didn’t have to scramble to find a new job in forty days. For the first time in the fifteen years I have lived abroad, I could finally take my time without fear of getting on the bad side of immigration authorities. As someone who has been on a work visa for the last fifteen years of my life, this was wild.&lt;/p&gt;

&lt;p&gt;At first, I tried the whole &lt;em&gt;funemployment&lt;/em&gt; thing, basically when you are not actively looking for a job. I &lt;a href=&quot;https://twitter.com/pcalcado/status/1445061127562477569&quot;&gt;posted a tweet about leaving&lt;/a&gt; but did nothing much around job seeking aside from answering a few messages here and there.&lt;/p&gt;

&lt;p&gt;I have recently signed with a new place. Before I talk about the new challenges ahead, I want to share five things I learned during this process. While bits and pieces are applicable for any tech role, this article explicitly focuses on &lt;em&gt;senior leadership&lt;/em&gt; roles, which were what I was looking for. I define these roles as executive roles for small companies (I would say fewer than 50 engineers) or Vice President of Engineering and above for mid-sized (say 50-500 engineers), or Director and above for larger organizations (500+).&lt;/p&gt;

&lt;h2 id=&quot;1-it-will-likely-take-longer-than-you-expect&quot;&gt;1. It will likely take longer than you expect&lt;/h2&gt;
&lt;p&gt;More senior roles are usually not &lt;em&gt;evergreen&lt;/em&gt;. In recruiting, we use the term evergreen role when talking about positions that are always open, featured on a company’s career page indefinitely. Every company has budget restrictions on how many people they can add to payroll, but the reality of a hot job market means that most of them can always add another back-end/front-end/mobile engineer to their team.&lt;/p&gt;

&lt;p&gt;And even if they are not evergreen per se, you will also find a lot of first-level engineering manager roles open at any given time. This happens because companies will need a new manager for every few &lt;em&gt;Individual Contributors (ICs)&lt;/em&gt; they hire. Given that companies are constantly hiring ICs, they also need to add new managers regularly.&lt;/p&gt;

&lt;p&gt;However, this relationship doesn’t hold as you go higher in the seniority ladder. Senior roles usually open up when someone needs replacement, if a reorg creates some leadership vacuum, when the company has reached a new growth stage, or when it starts a new strategic initiative and needs a leader.&lt;/p&gt;

&lt;p&gt;As you might imagine, companies only go through these events every so often in their lifetime. It might be that you are fortunate, and by the exact time you are looking for something, a great role comes up, but it is unlikely.&lt;/p&gt;

&lt;p&gt;Worse, people might be looking for a leader way ahead of time, which can be very frustrating. For example, I talked to a mid-sized company CEO about a role under them. In our first call, they explained that their product is being disrupted by competition and needs to change drastically or become obsolete. They thought of me as the perfect fit to lead this new initiative, and I was very excited about it. After a few exploratory chats over Zoom, I wanted to talk about the interview process. Then I realized that there was no actual role—at least &lt;em&gt;not yet&lt;/em&gt;. The executive laid out their plan to first fire this one person, then get this other person to fill in for them, then get this other person to change teams… and many more steps that would have created the perfect role for me. When I asked how long they thought it would take, their estimate was one month. Putting aside the Game of Thrones vibe, it’s been three months since, and they haven’t even fired the first person from the list.&lt;/p&gt;

&lt;p&gt;In hindsight, a better strategy for me would be to have started having these conversations at least three months before I left my previous job. I already had a feeling my journey there was not going to be that much longer, and when this feeling first kicked in I should have started looking around, even if casually.&lt;/p&gt;

&lt;h2 id=&quot;2-independent-headhunters-and-recruiters-are-a-valuable-resource&quot;&gt;2. Independent headhunters and recruiters are a valuable resource&lt;/h2&gt;
&lt;p&gt;To add another variable to your job search equation, not only do companies only open senior roles when there is a specific need, but they also are usually shy to make them public, especially on job boards. In my experience, small or medium companies only put these openings up if they have been looking for a while or some compliance framework requires that.&lt;/p&gt;

&lt;p&gt;Companies do that for various reasons. Sometimes, the imminent departure of a leader might not be public information yet—sometimes even to the person leaving! The company might not want the outside world to know of a new strategic initiative or pivot, even for net-new roles. One of the folks I talked to is moving their business from B2B to B2C, and they don’t want to telegraph the move by having a “Vice President of Engineering, Retail” role open.&lt;/p&gt;

&lt;p&gt;So how do you know about open roles in the market? The first step is to reach out to people in your network and let them know that you are looking. This will usually yield a few interesting leads, but the most efficient way is to use headhunters.&lt;/p&gt;

&lt;p&gt;When I started in this industry, headhunter meant something specific: a recruiter for senior and/or hard-to-find positions. These days we use the term to refer to any independent recruiter that gets paid handsomely when they fill a position. Even when I am not looking for a new job, I try to at least skim over every recruiter email I get. As you undoubtedly have experienced first-hand, &lt;a href=&quot;https://twitter.com/pcalcado/status/1471603925073764361&quot;&gt;the vast majority of unsolicited messages from recruiters is irrelevant, and badly automated spam&lt;/a&gt;. Still, now and then, a recruiter seems to have invested five seconds trying to research you and really thinks the position would be a good fit. These you want to build a relationship with, even if you are not looking for a job yet. I always reply, thanking them for the message and saying that I am unavailable, but I will let them know if anything changes. I also apply a Gmail label to these conversations to quickly find these good eggs when the time comes.&lt;/p&gt;

&lt;p&gt;You probably already have some of those reach out to you before. Go on your email and search for “your impressive background,” “opportunity,” and “well-funded startup.”  I am sure you have a few of those in your inbox from over the years. Your Linkedin inbox might also be filled with these messages that you have likely completely ignored in the past.&lt;/p&gt;

&lt;p&gt;Good headhunters can be an invaluable resource in your job hunt. Not only do they have access to the still-confidential openings we talked about, but also they work in networks. Recruiters share the jobs they are working on with their network and split the commission if someone helps them fill the position. This means that you will get a lot of the same roles from different recruiters, but also that even if that one headhunter you are talking to doesn’t have openings for you, they will likely know of other openings coming through their network.&lt;/p&gt;

&lt;p&gt;When it’s time for a new job, I send a note to folks in that Gmail label saying that I am open to new opportunities. Usually, they will try to book an introductory call. Recruiters love phone calls and don’t like doing things over email or text. This means that it is very easy to get overwhelmed by the number of recruiters trying to call you, and we will explore time management a little further down the text.&lt;/p&gt;

&lt;p&gt;Introductory calls are usually 30 minutes over the phone or video. Do not let them book you for longer; it is more than enough time. They usually will spend a few minutes telling you about who they are and the recruiting agency they work for, if any. Besides the fluff about how they are different from others and only take on the best openings (they all say that…), pay attention to the type of clients they work with. Are those the right size, industry, etc., you want to explore?&lt;/p&gt;

&lt;p&gt;They then ask you for your story. I recommend that you think about this before talking to any recruiter. Create a text document with a description of your professional history, previous jobs, and more significant accomplishments—at this stage, what is much more important than how. Do not forget to add something about why you left each job, especially if you were there for fewer than four years. Then edit repeatedly until it only includes information relevant to the role you want and has a straightforward, linear narrative.&lt;/p&gt;

&lt;p&gt;There are a few reasons why I do this. First, I like to force myself to tell my history concisely. It helps ensure that I don’t forget important details or find a rabbit hole that will eat up minutes on an introduction to no benefit.&lt;/p&gt;

&lt;p&gt;Then there is the fact that you are playing a game of telephone between recruiters and people from the hiring company. Do not be surprised or frustrated if every new person you talk to about a role asks you to introduce yourself from scratch, even if the recruiter had arguably briefed them. A “canonical” written version that you use repeatedly can help keep your story consistent across various interviews and interviewers.&lt;/p&gt;

&lt;p&gt;After the first introduction call, the recruiter will likely send to your email some positions they think would be a good fit for you. Usually, this is a mixed bag. Not only does the recruiter not yet know you that well, but they also will likely add both roles that you are not qualified for to show off and some that are a terrible fit, but they have been trying to fill for ages and might as well spam everyone.&lt;/p&gt;

&lt;p&gt;And this is something to keep in mind working with recruiters: they work for the hiring company, not for you.&lt;/p&gt;

&lt;p&gt;One recruiter I was working with guided me through the process with a small startup. Over four weeks, I had talked to most people at that company and was waiting for one last call with some engineering leader who, or so I was told, had been on vacation during that time. The invitation for the call never comes, and all I have from the company is radio silence for a week. I reached out to the recruiter, and they told me that everything was ok. They were just going over a big launch that week and a little busy. Following Monday, I get this message:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Hey Phil, just a quick heads up that we had a candidate accelerated through a process with The Company and has accepted an offer. 
The match for them was very strong and they decided to act quickly, so there was nothing they needed to compare against in their minds. 
I do appreciate your time on this one and hope we can work together again soon. Did you get a chance to check out That other company? www.that-other-company.com&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After some Linkedin stalking, the person hired had already worked with some of the executive team before. I completely understand the move but was very pissed with a wasted week.&lt;/p&gt;

&lt;p&gt;This kind of thing happens, and you need to understand that this is a transactional relationship. Still, it is in the recruiter’s best interest to have great relationships with senior candidates, so they will avoid doing anything that will piss you off.&lt;/p&gt;

&lt;h2 id=&quot;3-use-your-project-management-skills-to-keep-your-sanity&quot;&gt;3. Use your project management skills to keep your sanity&lt;/h2&gt;
&lt;p&gt;Finding a job in a hot market is one of the most challenging projects you will ever manage. You don’t have control over most aspects of the process, and even the influence you have needs to be managed carefully to avoid coming across as a demanding asshole. But the most complicated part is how the scarcity of you looking for one single job amongst many different options creates a textbook Game Theory problem.&lt;/p&gt;

&lt;p&gt;These days, I try to be very structured around this effort, which—you guessed it—means I have a spreadsheet for it.&lt;/p&gt;

&lt;p&gt;Below is a screenshot of the spreadsheet I’ve used most recently:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/job-hunt/spreadsheet.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I don’t want to make the file available because it matters how one uses it, not the template.&lt;/p&gt;

&lt;p&gt;I add every opening sent by a headhunter to the spreadsheet, even those I don’t find interesting.&lt;/p&gt;

&lt;p&gt;The most critical data to keep tabs on are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;How excited am I about this role? How much Priority do I want to give it?&lt;/li&gt;
  &lt;li&gt;How much do I feel the hiring company (not the headhunter) is excited about me?&lt;/li&gt;
  &lt;li&gt;When was the last update on this process, from either them or I&lt;/li&gt;
  &lt;li&gt;Who is supposed to take the next step? Is the ball on my court or theirs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Time allowing, surely I will act on any items blocked on me, but things aren’t that simple.&lt;/p&gt;

&lt;p&gt;You need to make sure you have the headspace to prepare and research your tier 1 opportunities. You also need to pay attention to the various other things going on in your life, especially if you still have a full-time job. And, most important, you need to avoid burning out because this is a very stressful process.&lt;/p&gt;

&lt;p&gt;Every time I interact with the headhunter or hiring organization, I update the spreadsheet. I use conditional formatting to make the “last update” cell green/yellow/red based on how long the last contact was.&lt;/p&gt;

&lt;p&gt;I also use sorting and conditional formatting on the spreadsheet to help me quickly identify the status of the roles that both parties are excited about, which tend to be my high Priority.&lt;/p&gt;

&lt;p&gt;The first thing I do every morning is to check the high-priority roles and make sure that I don’t drop the ball in getting back to them and do a check-in if they are taking too long to get back to me.&lt;/p&gt;

&lt;p&gt;After whatever actions for the high-priority ones, I go over the other ones in priority order and reassess them. Should they go higher or lower in Priority? Did any new information come that changed how I feel about them?&lt;/p&gt;

&lt;p&gt;As a self-imposed SLA, I try never to take longer than 24 hours to reply to tier 1 opportunities, not longer than three days for tier 2, and a week for the rest. This spreadsheet’s value comes from being an easy, visual, process to manage my SLAs.&lt;/p&gt;

&lt;p&gt;Speaking of time management, something that has helped me immensely is to use &lt;a href=&quot;https://calendly.com/&quot;&gt;Calendly&lt;/a&gt;. Calendly and similar tools allow you to send a link that will enable people to book meetings in your calendar, drastically reducing the back-and-forth of finding a good time for everyone. You will see that many headhunters use it, but you should have your own account and make sure that it is in sync with your personal and professional calendars.&lt;/p&gt;

&lt;h2 id=&quot;4-be-strategic-around-your-interviews-and-chats&quot;&gt;4. Be strategic around your interviews and chats&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;/2016/03/15/on_asking_job_candidates_to_code.html&quot;&gt;I am very intentional with how I design recruiting processes for folks I hire&lt;/a&gt;, and I try to follow these same general principles to the process when I am on the other side of the table.&lt;/p&gt;

&lt;p&gt;My guiding philosophy in both scenarios is that &lt;em&gt;it is impossible to know if a candidate is a good fit for a job&lt;/em&gt;. So, with this in mind, instead of trying to &lt;em&gt;validate&lt;/em&gt; if it would be a good match, I start from the assumption that it would be and then try to &lt;em&gt;falsify&lt;/em&gt; the hypothesis as early as possible.&lt;/p&gt;

&lt;p&gt;When looking for a job,  I first list what I am looking for and what I don’t want in my next position. Usually, this has the kind of role and titles, the organization’s size, profitable vs. pre-revenue vs. growth-oriented, how many rounds of funding or close to an exit they might be, etc. &lt;strong&gt;The current job market for tech is so hot that even if you cannot choose where you will work, you can definitely choose where you will not.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I usually do not share this list with headhunters or hiring companies. I don’t want them to take the list literally and end up missing out on an opportunity that could be actually pretty good, even if not perfect. Also, if they &lt;em&gt;really&lt;/em&gt; want me to apply (maybe because the headhunter really needs to show their clients that they are sourcing good candidates!), they will find ways to present whatever role they are working on as a perfect match.&lt;/p&gt;

&lt;p&gt;Following this process, when you decide to move ahead with a position someone sent over, you assume this would be a good fit. Your task now is to use every interaction to falsify this assumption, searching for evidence that the role does not fulfill what you have listed as your requirements. Take some time beforehand to think of questions that can help you in this discovery. Keep in mind that it is rarely a good idea to ask directly about subjective topics. People are in &lt;em&gt;sell mode&lt;/em&gt; when talking to you. While it is OK to ask how many engineers a company has, or if they intend on getting new funding soon, questions like &lt;em&gt;“what do you think of your engineering culture?”&lt;/em&gt; aren’t going to surface helpful information.&lt;/p&gt;

&lt;p&gt;I strongly recommend that you keep your questions laser-focused on the list of requirements you wrote, but I do tend to have a few more general questions I ask every person I talk to. My favorite is &lt;em&gt;“What is your current bottleneck? What is the one thing that prevents you from moving as fast as you think you should move&lt;/em&gt;”? Then, depending on the answer, I have a follow-up: &lt;em&gt;“If this constraint would magically disappear tomorrow, what do you think would become the next one?”&lt;/em&gt; This line of questioning is from the &lt;em&gt;Theory of Constraints&lt;/em&gt; and gives you a good idea of how folks work and think. For example, it is common for the answer to be &lt;em&gt;“We don’t have enough engineers”&lt;/em&gt;. This is almost always an indicator that the leadership team isn’t as experienced as they might present themselves. Nobody &lt;em&gt;ever&lt;/em&gt; wants to hire engineers; there is something they want, and they believe that hiring engineers is the only way to get there—and that is seldom the case.&lt;/p&gt;

&lt;p&gt;Something else to falsify as early as possible is where the position lies in the organization. Titles can be very misleading, a company might have a director managing three people while other of similar size have a manager of thirty, but make sure that your new title won’t sound like a demotion or stagnation in your resumé—this might bite you on the back the next time you are looking for a job. In my experience, the best way to find good evidence if the position they have is close to what you want is to find out whom you would report to and who would report to you. Understandably, this might be a little fuzzy in small companies, but make sure that their seniority doesn’t feel misaligned with your expectations. Also, please make sure you spend a considerable amount of time with your boss-to-be during the process.&lt;/p&gt;

&lt;h2 id=&quot;5-do-not-waste-your-time-but-part-as-friends&quot;&gt;5. Do not waste your time, but part as friends&lt;/h2&gt;
&lt;p&gt;This should be a guiding principle when applying for any job, but it is even more important for senior leadership roles. They require massive time investment from busy people such as you and the hiring organization leaders, so being honest and upfront can save everyone enormous time, money, and energy.&lt;/p&gt;

&lt;p&gt;Following the process from the previous section, once I realize that a position does not meet the requirements I had listed, I tend to email the headhunter and the hiring organization the next day. I still give it until the following day so that I have some extra time to think about it and avoid a potential knee-jerk reaction to a single lousy interview or something like that, but if I make my mind, I will email them within 24 hours, tops.&lt;/p&gt;

&lt;p&gt;There is always the question of how much feedback you want to give the various people you might have talked to during this process. You absolutely should volunteer the primary reason driving your decision (e.g. &lt;em&gt;“I am currently interested in more senior roles/smaller organizations/moving out of the finance industry”&lt;/em&gt;), but keep details and secondary reasons to yourself. And, unless the process was an absolute clusterfuck and you want the hiring company to know, I would only send feedback on the process to the recruiter, not people from the hiring company. Remember: you want to keep a good relationship with the headhunter, and getting between them and their client introduces massive risk for no benefit to you.&lt;/p&gt;

&lt;p&gt;And also, keep in mind that just because the company doesn’t have a role for you now doesn’t mean that it won’t ever have it in the future. The organization will grow and expand its needs and possibilities. There will be reorgs and departures that will create all sorts of opportunities. So be kind with your words and make yourself available for a regular catch-up and networking.&lt;/p&gt;

&lt;p&gt;In fact, in the recent past, I have developed &lt;em&gt;advisor&lt;/em&gt; relationships with organizations that were not a good fit. These relationships deserve their own article, but it is something to consider bringing up as you part ways.&lt;/p&gt;
</description>
        <pubDate>Mon, 20 Dec 2021 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2021/12/20/job_hunt.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2021/12/20/job_hunt.html</guid>
        
        <category>Management</category>
        
        <category>Leadership</category>
        
        <category>Recruiting</category>
        
        
      </item>
    
      <item>
        <title>How I like to use OKRs</title>
        <description>&lt;p&gt;Recently I sent a memo to my teams at SeatGeek setting the scene around changes that I want to see in our OKR and planning processes. I’ve asked a few people from my professional network for feedback on this email, and it seems like this is something many other organizations struggle with. I am publishing a lightly-edited version of the memo below. Hopefully, it will be useful to some people facing similar challenges.&lt;/p&gt;

&lt;hr /&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;From: Phil Calçado
Date: Thu, Jan 9, 2020, 12:30 PM
Subject: OKRs in 2020
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hi team,&lt;/p&gt;

&lt;p&gt;We are already a few weeks into 2020, but many teams are still working on their OKRs. This creates an opportunity for us to iterate on how we use this tool at SeatGeek. As a first step, I want to challenge the way we think about OKRs. For now, I don’t want to perform any drastic changes to the current process. What I want is to give all of you some context to help understand why I might nudge you one way or another now and in the future as we work on our goals and strategy.&lt;/p&gt;

&lt;p&gt;There is a lot on OKRs in our wiki and all over the Internet. In this text, I want to focus on real-world usage and the challenges of this powerful tool. That is why I am skipping introductions and  assuming that you have some familiarity with what are OKRs and probably have used them at SeatGeek or a previous employer.&lt;/p&gt;

&lt;h3 id=&quot;one-problem-i-have-seen-with-our-okrs&quot;&gt;One problem I have seen with our OKRs&lt;/h3&gt;
&lt;p&gt;After being through a few OKR cycles at SeatGeek, I am convinced that we tend to fall into a very common trap: we use OKRs the same way more traditional organizations use &lt;a href=&quot;https://en.wikipedia.org/wiki/Work_breakdown_structure&quot;&gt;Work Breakdown Structure (WBS)&lt;/a&gt;. To illustrate what I mean by this, I will use an oversimplified illustrative example.&lt;/p&gt;

&lt;p&gt;Let’s suppose that I am setting OKRs for my personal life. I decide that one of the most important &lt;em&gt;Objectives&lt;/em&gt; I have is &lt;strong&gt;“To be healthy.”&lt;/strong&gt; There are a few different ways to express this &lt;em&gt;Objective&lt;/em&gt; in an OKR-based process. If we follow the typical SeatGeek style, we probably will build something like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2020-01-31-okr/example-okr1.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Looking at the above, it might sound like a reasonable plan for someone to be healthy. The problem here is that even if we do all these things we set up to do, we can still be very unhealthy. For example, maybe you reduced your weekday alcohol consumption, but now you drink a lot more sugary drinks over the week, or perhaps you are cooking your own meals, but all you cook is mac and cheese.&lt;/p&gt;

&lt;p&gt;Instead, the way I have seen OKRs working well is when you use the &lt;em&gt;Key Results&lt;/em&gt; as the test if you have achieved the &lt;em&gt;Objective&lt;/em&gt;. Applying this mindset, let’s think about some of the things that are generally accepted as indicators that someone is “healthy”:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/2020-01-31-okr/example-okr2.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is obviously an oversimplification to illustrate my point—I am not a doctor, and you should not follow anything I say about health—but I’m sure you got the idea.&lt;/p&gt;

&lt;p&gt;To me, &lt;strong&gt;the most significant benefit of this format is that it focuses on the outcome instead of output&lt;/strong&gt;. The number of projects, features, RFCs, or bugfixes we build and deploy are irrelevant. The only thing that matters is what material impact these had on the business and the experience of our users, partners, and employees.&lt;/p&gt;

&lt;p&gt;It also helps us define what do I mean when we use terms such as “healthy.” &lt;em&gt;Objectives&lt;/em&gt; will almost always be annoyingly hand-wavy, and the fact that they are open to interpretation tends to create some friction between teams. In this model, we are trying to define what we mean by “healthy” precisely. Different parties will argue a lot about what should be in it, but once the definition is agreed upon, it becomes a clear contract we all live by.&lt;/p&gt;

&lt;p&gt;Another significant advantage of this style is that it gives teams a lot of freedom in how they will achieve that. What you have agreed on doing, i.e. your OKR, is just the &lt;em&gt;what&lt;/em&gt;. Whoever is accountable for the OKR should be empowered to explore options for &lt;em&gt;how&lt;/em&gt; they will get there. At the beginning of a quarter, teams will begin new projects and initiatives focusing on achieving their OKRs. They will use small and continuous releases to push their work to the users early and often. Still, they might observe that all this work doesn’t really have a material impact on the &lt;em&gt;Key Results&lt;/em&gt; the way they thought it would. In a healthy OKR culture, teams in this situation should immediately regroup and pivot, exploring what other projects they should try to achieve these desired results.&lt;/p&gt;

&lt;h3 id=&quot;okrs-roadmaps-and-project-portfolios&quot;&gt;OKRs, Roadmaps, and Project Portfolios&lt;/h3&gt;
&lt;p&gt;One interesting challenge in applying this model is that it often requires familiarity and access to essential metrics— often called KPIs, or Key Performance Indicators. It is perfectly fine, and even expected, for a &lt;em&gt;Key Result&lt;/em&gt; to be that the team starts collecting data on some KPI we would like to use for future OKRs.&lt;/p&gt;

&lt;p&gt;It is not impossible, though, that an &lt;em&gt;Objective&lt;/em&gt; has one or a few &lt;em&gt;Key Results&lt;/em&gt; as the delivery of some project or artifact, but this should be seen as a &lt;em&gt;bad smell&lt;/em&gt;. It is an indication that we are probably missing some metric that can better reflect the desired outcomes.&lt;/p&gt;

&lt;p&gt;Ideally, a team will look at their OKRs and start planning what efforts or projects they should start/keep/stop to achieve the desired results within the timeframe. This is generally called &lt;em&gt;portfolio management&lt;/em&gt;, and it is something that I will be working closely with you all regularly.&lt;/p&gt;

&lt;p&gt;If you want to learn more about this topic, there are a few good books on OKRs and similar goal-setting processes. My favorites are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://amzn.to/31icUPc&quot;&gt;Measure What Matters&lt;/a&gt;, the classic book by John Doerr&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://amzn.to/31gP1HG&quot;&gt;High Output Management&lt;/a&gt;, Andy Grove’s seminal work on management and strategy&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://amzn.to/2Sbbou8&quot;&gt;The Advantage&lt;/a&gt;, which I see as Patrick Lencioni compiling most of his work on management in a single book&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://amzn.to/38TEncu&quot;&gt;OKRs, From Mission to Metrics: How Objectives and Key Results Can Help Your Company Achieve Great Things&lt;/a&gt;, an interesting collection of articles on real-world challenges in using OKRs by &lt;a href=&quot;https://amzn.to/3b1meeI&quot;&gt;Francisco H. de Mello&lt;/a&gt;As always, please reach out to me with any comments, feedback, or question.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cheers&lt;/p&gt;
</description>
        <pubDate>Fri, 31 Jan 2020 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2020/01/31/how_i_like_to_use_okrs.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2020/01/31/how_i_like_to_use_okrs.html</guid>
        
        <category>Engineering Management</category>
        
        <category>SeatGeek</category>
        
        <category>OKRs</category>
        
        <category>Strategy</category>
        
        <category>Project Management</category>
        
        
      </item>
    
      <item>
        <title>Guiding Principles for Developer Tools</title>
        <description>&lt;p&gt;Just like almost anything else in software engineering, we don’t have a precise definition for the term microservice. This lack of formality doesn’t make the term worthless, however. There are a few useful characteristics we can infer whenever someone says they have an architecture that follows this paradigm. One such characteristic of “microservices-based architecture” is that they have a lot of small, independent, pieces of software—the so-called &lt;em&gt;services&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;With so many small services to build and manage, I find it useful to think about this as &lt;a href=&quot;https://www.infoq.com/news/2017/05/economics-microservices/&quot;&gt;the economics of  microservices&lt;/a&gt;. Basically, &lt;strong&gt;the organization needs to make it “cheaper” to build and operate products following the microservices way than adding “just one more feature” to the monolith.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Applying this mindset, something organizations quickly realize is that they need to invest in some areas usually neglected in more traditional, monolithic architectures. Building on prior art by Martin Fowler, &lt;a href=&quot;/2017/06/11/calcados_microservices_prerequisites.html&quot;&gt;I wrote a detailed article on this&lt;/a&gt;. Here is a handy list of areas that require some extra investment before adopting microservices:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Rapid provisioning of compute resources&lt;/li&gt;
  &lt;li&gt;Basic monitoring&lt;/li&gt;
  &lt;li&gt;Rapid deployment&lt;/li&gt;
  &lt;li&gt;Easy to provision storage&lt;/li&gt;
  &lt;li&gt;Easy access to the edge&lt;/li&gt;
  &lt;li&gt;Authentication/Authorization&lt;/li&gt;
  &lt;li&gt;Standardized RPC&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These days, cloud providers and open source projects offer great tools to minimize the need for custom solutions for most of the listed above. Nevertheless, it is still the case that an organization needs to build &lt;em&gt;some&lt;/em&gt; tooling. Usually, we need some glue code to fill in gaps between off-the-shelf tools, to enforce conventions, or offer a more productive workflow to engineers.&lt;/p&gt;

&lt;p&gt;I have spent the last few years building and such tools, for both internal users and as products with paying customer. Sometimes, this work was done by a product engineering team, sometimes by an infrastructure team. To simplify our vocabulary, I will call this type of work &lt;em&gt;platform&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Over time, my experience in platform work has led me to compile a list of principles that like to follow when building developer tooling, which I document in this article.&lt;/p&gt;

&lt;h2 id=&quot;know-your-audiences&quot;&gt;Know your audience(s)&lt;/h2&gt;
&lt;p&gt;It is very tempting for teams building dev tools to try and build the tools that &lt;em&gt;they&lt;/em&gt; would love to have. Well, just like with any other type of product development, &lt;strong&gt;teams building dev tools need to take a step back and understand that they are not the user.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If someone is part of a platform team working on developer tooling, this person likely has interest, skill, and experience on how to work with infrastructure. You should not expect the same from people who are going to use the tools this team creates.&lt;/p&gt;

&lt;p&gt;One way to understand whom you are building your tooling for is to run quarterly surveys, in which engineers &lt;em&gt;self-assess&lt;/em&gt; their proficiency levels in various technologies used by the organization (e.g., AWS Lambda, microservices, Node.js, Go, MySQL, etc.). Getting people to respond to surveys like this is always challenging, but making a survey anonymous and a self-assessment tends to increase engagement levels.&lt;/p&gt;

&lt;p&gt;The survey should be straightforward; here is a screenshot of the survey my team used at Meetup:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/survey.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Data from a survey like this is subjective and should be combined with other quantitative and qualitative sources of feedback. Still, this is a great way to draw a &lt;em&gt;map&lt;/em&gt; that helps you visualize the gaps in skill and experience you might have. The team should use the results to help build and prioritize their backlog and roadmap.&lt;/p&gt;

&lt;p&gt;At Meetup, for example, results coming from the survey above showed that most engineers had some level of experience with Serverless technologies such as DynamoDB and AWS Lambda. Surprisingly, only a few people declared to know about fundamental topics such as IAM, VPC, CloudFormation, etc. Based on this split, the platform team decided to build first features that make it easier to use the latter and postpone working on Serverless-specific topics.&lt;/p&gt;

&lt;p&gt;Prioritizing engineers not well-versed in infrastructure doesn’t mean that we could ignore infrastructure-savvy folks. There are many different ways to make sure you don’t alienate them from the work, but the most important step the platform team needs to take is to make it clear to infrastructure experts that &lt;strong&gt;if you know enough about infrastructure to have strong opinions about its internals, you probably aren’t part of the main audience for the tool&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, even if they aren’t the target audience, the different cohorts must coexist. &lt;strong&gt;It is on the platform team to make an effort to allow the experts to integrate their tools and workflow with the tooling they create&lt;/strong&gt;. This goal is not &lt;em&gt;always&lt;/em&gt; possible, but you’d be surprised with how much common ground can be reached with a little bit of goodwill from both sides.&lt;/p&gt;

&lt;p&gt;At Meetup, before we added any major feature to our developer tool, we would create a write-up, often started as an &lt;a href=&quot;/2018/11/19/a_structured_rfc_process.html&quot;&gt;RFC&lt;/a&gt;, describing it in detail. It would, for example, describe that a new feature that creates AWS Accounts for users would automatically add such and such roles, with such and such permissions, and follow a specific naming convention. This spec allowed folks like our data science team, who had already invested a lot on their own Terraform-based automation, to make sure that their infrastructure was compatible with the rest of the organization.&lt;/p&gt;

&lt;h2 id=&quot;collect-and-monitor-usage-metrics&quot;&gt;Collect and monitor usage metrics&lt;/h2&gt;
&lt;p&gt;After spending most of my career in product engineering, something that shocked me when I started working on infrastructure products was how little information about our users’ habits and usage of the platform the team had. We relied a lot on asking user feedback, sometimes as user panels, others by using inviting a few users to a &lt;a href=&quot;https://en.wikipedia.org/wiki/Usability_lab&quot;&gt;usability lab&lt;/a&gt;. We never observed what people were doing in their normal day-to-day lives; all we knew was what they told us or what we saw in a lab.&lt;/p&gt;

&lt;p&gt;As an attempt to change that, &lt;strong&gt;whenever I am building a developer tool, I make sure that we send usage metrics to some analytics database&lt;/strong&gt;. The platform team can then analyze this data and find interesting insights about how people use their tools in the real world, performing real tasks. By doing this, you may, for example, notice that your users still rely too much on the AWS web interface for something that your command-line tool already provides, that some commands are always run in sequence and could be collapsed in one, or that a given feature users needed to run many times a day is too slow and probably frustrating your users.&lt;/p&gt;

&lt;p&gt;Every time a tool is used, it sends to the analytics database at least the full command line invoked by the user, everything that was written to STDOUT and STDERR, and how long the operation took. You might also want to send any relevant environment variables, who is the current user, and from which host is this being executed. &lt;strong&gt;Think of this as Google Analytics for your command-line tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At Meetup, running our tools with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-v&lt;/code&gt; flag showed to users what information was sent to the server (it’s in the last line of the output):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/cloud-tools-analytics.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;One interesting challenge is that internal tools are unlikely to have enough usage metrics to have statistically significant data. They often fall into what is sometimes called  “&lt;em&gt;small data&lt;/em&gt;,” which roughly means that the dataset produced is small enough to be understood by humans but not large enough to apply those neat statistical methods that modern product management loves.&lt;/p&gt;

&lt;p&gt;That is why, while it is often interesting to analyze the &lt;em&gt;usage&lt;/em&gt; metrics of your tooling, it is probably more important to analyze the &lt;em&gt;impact&lt;/em&gt; they have. At Meetup, we measured this by tagging every AWS resource touched by our tooling with some metadata that allowed us to see that this particular resource had been created or updated by our product. We could then quickly visualize how much of our infrastructure was managed by our tooling versus using alternative ways. This information was fundamental when defining our projects and priorities for the platform team.&lt;/p&gt;

&lt;p&gt;A few practical considerations when implementing metrics for your tooling:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Make sure you add a flag that allows users to bypass sending analytics data to the server. At Meetup this is achieved by the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--incognito&lt;/code&gt; flag that every command honors&lt;/li&gt;
  &lt;li&gt;Your tools should never take passwords or other sensitive information as parameters or output them, but in case you absolutely have to do so please make sure that you do not collect this information in plain text in your logs&lt;/li&gt;
  &lt;li&gt;Failure in sending analytics shouldn’t prevent the tool from working. If it can’t send the data, you might want the tool to save the logs on local disk to be sent later. Whatever you do, though, do not throw an error at the user just because logs can’t be sent to the server&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;avoid-creating-new-abstractions-simplify-existing-ones&quot;&gt;Avoid creating new abstractions, simplify existing ones&lt;/h2&gt;
&lt;p&gt;When I headed product engineering at DigitalOcean, we were always concerned about how our could we offer sophisticated products to our users without requiring them to read a 200-pages manual to find out if they needed its features at all.&lt;/p&gt;

&lt;p&gt;One option to deal with this challenge was to wrap infrastructure-heavy concepts as higher-level abstractions. For example, instead of selling VMs and object storage as separate primitives, we could package them all together as a single product, something like what Google AppEngine did back in the day.&lt;/p&gt;

&lt;p&gt;This idea had its appeal, but something that even Google suffered with back in the day (and AWS and others are experiencing as they evangelize Serverless computing), is that every time you do something like this you are not actually &lt;em&gt;removing&lt;/em&gt; complexity, you are just replacing existing concepts with a whole new set of abstractions. Even if the older abstractions were complicated, there are probably thousands of StackOverflow questions, tutorials, books, etc. that document and explain them. Irrespective of how much simpler they might be, &lt;strong&gt;if you create new abstractions, it is on you to educate your userbase on how to use them&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead, we decided to work with existing concepts as much as possible, and simplify how users interacted with them. As an example, the first version of &lt;a href=&quot;](https://blog.digitalocean.com/load-balancers-simplifying-high-availability/)&quot;&gt;our load balancer product&lt;/a&gt; was nothing more than a few VMs running HAProxy and managed by Terraform—nothing that users couldn’t already do on their own. Instead of exposing the complexities of these tools, though, we tried to create a clean user interface that didn’t try to be smart, it just removed any details that weren’t important for the majority of our users:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/do-lb.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;At Meetup, we have standardized on CloudFormation as our configuration management tool. Unfortunately, CloudFormation’s out-of-the-box user experience is awful. As an example, let’s say that you have a CloudFormation template named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;standard_user_and_permissions.yaml&lt;/code&gt; in your local directory. Here is what we needed to do to run this template using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws&lt;/code&gt; command-line tool:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/cloud-formation-mess.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The only parameter in this very long command-line that is unique to the task is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--template-body&lt;/code&gt;. Everything else is just metadata that one needs to add accordingly to Meetup’s conventions and standards for AWS.&lt;/p&gt;

&lt;p&gt;Considering how often engineers performed this task during their day-to-day work, our platform team decided that this was worth automating. We added a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create-stack&lt;/code&gt; option to our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cloud-tools&lt;/code&gt; utility, and the new command looked like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/cloud-formation-cloud-tools.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When building the feature above, our main goal was to avoid requiring engineers to remember and type each one of the arcane yet super important parameters. We figured out that we could infer everything we needed from things like AWS configuration file, environment variables, and directory structure conventions, which simplifies the user experience drastically.&lt;/p&gt;

&lt;p&gt;We could take this a little further, and use more conventions and metadata to completely eliminate the need for engineers to write the CloudFormation templates—let’s be honest, they are mostly copied and pasted around. Even if this could streamline the workflow even further, We have decided to simplify, but not hide, CloudFormation.&lt;/p&gt;

&lt;p&gt;One of the reasons for this decision was the educational argument discussed above—we had access to inexpensive or free educational resources and consulting on CloudFormation. Another big reason was that we have realized that the more we hide away a fundamental tool like CloudFormation, the harder it would be for us to adopt new features from AWS. If we use our own abstractions for configuration management, every time AWS releases a new feature our users would have to wait until the platform team adds support to it to our tools.&lt;/p&gt;

&lt;p&gt;If we do not shy away from CloudFormation, we would be able to use new features as soon as AWS makes them available—granted, AWS is notorious for not adding new features to CloudFormation until after launch, but it would still take longer to do it ourselves.&lt;/p&gt;

&lt;h2 id=&quot;build-on-top-of-the-existing-user-experience-do-not-try-to-hide-it-away&quot;&gt;Build on top of the existing user experience, do not try to hide it away&lt;/h2&gt;
&lt;p&gt;SoundCloud started heavily investing in container technology around 2011. This was years before Docker was released, so we had to develop  our own tooling. Like most people back then, we used cgroups, Linux namespaces, and SquashFS images to build our container infrastructure. Containers were used only in production, during development folks would use their local machine’s environment and upon deploy, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git push&lt;/code&gt;, the platform would package the code as a container and deploy it. It was designed to offer an experience almost identical to Heroku’s, as &lt;a href=&quot;https://www.slideshare.net/pcalcado/evolutionary-architecture-at-work&quot;&gt;this slide from a presentation I gave in 2013 shows&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/bazzoka-ui-slide.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This system has served us for many years and during our most extreme hyper-growth stages. Eventually, though, it became clear that adopting the Docker toolset would be extremely beneficial to us, especially as it would allow engineers to run containers on their development machines smoothly.&lt;/p&gt;

&lt;p&gt;As we changed our platform, planned for this change, we faced the familiar challenge of keeping our engineers as productive as possible while we transition our platform to the new technology. One way we have found to achieve that was to invest in automation, creating tools that would make it super easy for people to perform some of our most everyday tasks, even if they had no idea what Docker was or how to use it.&lt;/p&gt;

&lt;p&gt;As an example, here is the output of a tool we had that automatically created build pipelines in our Jenkins cluster:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/pacu.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pipeline&lt;/code&gt; tool shown above at work read a manifest file containing some metadata about the project and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Makefile.pipeline&lt;/code&gt;, which contains instructions about run to run the build—very similar to the role &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.travis.yml&lt;/code&gt; when using Travis CI.&lt;/p&gt;

&lt;p&gt;Something interesting about the tool is that not only it uses the Docker command line as discussed in the previous section, but it also writes to STDOUT the full command line it invoked and the full output returned by the process it has spawned. At first, this was a debugging resource used by the platform team while developing these tools, but for some reason, it was never turned off before releasing the tool to our engineers.&lt;/p&gt;

&lt;p&gt;One massive positive impact that this had was that it was a great way for engineers to get acquainted with Docker. I think it would be correct to say that everything I learned about how to use Docker back then was by observing what the tool was doing and how Docker would react.&lt;/p&gt;

&lt;p&gt;This accidental feature was something that I have assimilated as a principle for all infrastructure tools.&lt;/p&gt;

&lt;p&gt;Every tool we built at Meetup had a built-in “verbose” mode that showed users what AWS commands are being issued and what is returned. For example, if you want to see a list of AWS which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OrganizationalUnits&lt;/code&gt; (basically a grouping of AWS accounts) belong to which teams, you would typically run this command:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/clouttoolsteamlist.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;However, if you added the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-v&lt;/code&gt; flag to the line above, it would output everything that has to do with the AWS commands:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/clouttoolsteamlistminusv.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Given how low-level the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws&lt;/code&gt; command-line commands tend to be, this tends to be very noisy. That is why it is not enabled by default.&lt;/p&gt;

&lt;p&gt;As discussed, exposing users to the ins and outs of the underlying platform is an efficient and inexpensive way to teach by example. Another benefit of this approach is that it is much easier for users to work around problems and get help when things go wrong—especially during partial failures. The user can see exactly what the tool was doing when things went wrong, which helps both them and the platform team understand what steps you need to take to fix the problem.&lt;/p&gt;

&lt;h2 id=&quot;rely-as-little-as-possible-on-what-is-installed-on-the-host-or-remote-servers&quot;&gt;Rely as little as possible on what is installed on the host or remote servers&lt;/h2&gt;
&lt;p&gt;A question teams typically have when they start getting serious about building platform tools is what programming language or runtime they should use to build these tools.&lt;/p&gt;

&lt;p&gt;I am not interested in lengthy debates about which programming language is the best one—well, not &lt;a href=&quot;https://www.slideshare.net/pcalcado/one-or-two-things-you-may-not-know-about-typesystems/pcalcado/one-or-two-things-you-may-not-know-about-typesystems&quot;&gt;anymore&lt;/a&gt;.  I believe that an engineer should become familiar with as many programming languages and paradigms as possible and make a decision about which one to use for a specific project based on the constraints under which they work. In my experience, the primary constraint is always on the people side, either picking something that your team can be productive on very quickly and that you can find good candidates when you need to grow your team.&lt;/p&gt;

&lt;p&gt;When it comes to building platform tools, though, there is one other constraint that is always present: &lt;strong&gt;your tools should have as few moving parts as possible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Back to the work we’ve done at SoundCloud, we first built most of our Docker-based tools previously discussed here using Bash scripts. As it &lt;em&gt;always&lt;/em&gt; happens, at some point Bash becomes hard to scale and test, and we need to pick a new platform. The team had experience in Python and Ruby, so we started building our tools in these two languages. At first, this worked well, as both are very productive and have a vast amount of libraries, testing tools, and real-world examples we could leverage.&lt;/p&gt;

&lt;p&gt;Soon enough, though, we started having some issues. Every engineer already had some version of both Python and Ruby installed on their laptops, but the same wasn’t necessarily true for our servers, build boxes, and the laptops from product managers, designers or any other non-engineering folks who might need to perform a small infrastructure task as part of their job.&lt;/p&gt;

&lt;p&gt;However, even engineers were having issues. They would need to keep and manage many different versions of these runtimes. Our legacy Rails application required a specific version of Ruby, some new services required their own versions, and our tools would run on another version. Even if tools like rbenv and RVM make it possible to manage these things, way too often people would report problems when using our tools that were caused by the user mistakenly running the tool against the wrong version of runtime or library.&lt;/p&gt;

&lt;p&gt;We tried solving this using package managers like APT and Homebrew, but it felt like adding more overhead and friction to our users. We then packaged all of our tools as Docker containers and made it such that every time a tool was invoked, it would just execute &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker run&lt;/code&gt; on a container image that we had baked. This setup worked ok enough for a while, but it was a massive performance hit for a tool that was supposed to run and finish quickly, and it also required a very long set of configurations and conventions to map networking and filesystem between localhost and the Docker container.&lt;/p&gt;

&lt;p&gt;When I was at DigitalOcean, we released &lt;a href=&quot;https://blog.digitalocean.com/introducing-doctl/&quot;&gt;a command-line tool to our customers distributed as a single binary&lt;/a&gt;. It was a natural choice for us back then, as DigitalOcean builds systems almost exclusively using Go, and this is how binaries are distributed on this programming language. This distribution style was very successful, as all that our users needed to do to use our cloud was download this one executable file, as opposed to &lt;a href=&quot;https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html&quot;&gt;the multi-step process that AWS requires when installing their Python-based command-line tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Go isn’t the only modern language that can produce reasonably small executable binaries, and other options like Rust are getting more and more traction amongst platform teams. Irrespective of what programming language you pick, &lt;strong&gt;make sure that the resulting executable is self-contained, that it doesn’t require users to install any runtime or virtual machine on their computers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At Meetup, we followed this principle for all of our command-line tools, but we had a big challenge in that our tooling required the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws&lt;/code&gt; command-line tool to be installed by the user. We assumed that proper feed and care of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aws&lt;/code&gt; tool was a reasonable expectation to have on our engineers, and added a lot of checks to make sure that our tool would detect and let the user know when there was a problem with their local AWS installation—see the health check section below.&lt;/p&gt;

&lt;p&gt;It is very common for platform tools to interact with systems like CloudFormation, Terraform, and Kubernetes, which require their users to write configurations on files written in JSON, YAML, or another declarative language. These templates need to be stored somewhere. One approach is to keep them on a remote location, such as an S3 bucket or Maven-style repository. I have found this approach problematic for a few reasons.&lt;/p&gt;

&lt;p&gt;Firstly, it adds another moving part to your toolset, which is undesirable. This architecture requires this remote location to be always accessible, which implies high-availability needs, on-call support, incident management, etc.&lt;/p&gt;

&lt;p&gt;It also adds some overhead on versioning. Command-line tools make some assumptions about the templates about things like which parameters they expect. If you change the template, you need to think about how this change could impact all possible versions of the command-line installed in laptops, buildboxes and elsewhere.&lt;/p&gt;

&lt;p&gt;Both issues arise from the fact that the command-line tool and the templates are highly coupled. In general, it is advisable to keep two highly coupled components together, as part of the same artifact. When it comes to platform tools, my suggestion is that you embed the templates within the command-line binaries, using tools like &lt;a href=&quot;https://github.com/gobuffalo/packr&quot;&gt;Packr&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;have-a-built-in-self-check&quot;&gt;Have a built-in self-check&lt;/h2&gt;
&lt;p&gt;When I was at Buoyant, our tiny engineering team split our time between working on &lt;a href=&quot;https://linkerd.io/&quot;&gt;Linkerd v2&lt;/a&gt; and supporting the hundreds of users of the first version of our &lt;a href=&quot;/2017/08/03/pattern_service_mesh.html&quot;&gt;Service Mesh&lt;/a&gt;. We had an on-call rotation for support, and engineers would rotate on helping our community on forums, Slack, and help with issues they had found.&lt;/p&gt;

&lt;p&gt;Something that one finds out when doing user support for open-source products is that you spend most of the time trying to figure out if the issue is caused by something in the user’s environment or in your product. As we were an open-source project, we couldn’t ask for access to the users’ systems and had to rely on asking them questions on a public forum. This slow-paced interaction made the process take forever.&lt;/p&gt;

&lt;p&gt;That is why one of the first features we built for Linkerd v2, while it was still called &lt;em&gt;Project Conduit&lt;/em&gt;, was a self-check that would try to make sure that some basic requirements were in place. Inspired by &lt;a href=&quot;https://docs.brew.sh/Manpage#doctor-options&quot;&gt;Homebrew’s doctor command&lt;/a&gt;, we tried to give as much information to the user as possible so that they could maybe fix the problems themselves before asking on the forum.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/conduitcheck.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;But even when people couldn’t fix their own issues, the first thing we did when people had issues was to ask them to paste the output of this command. This gave the support engineer a lot of useful information from the beginning, instead of having to ask lots of questions over a long period of time.&lt;/p&gt;

&lt;p&gt;This &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;check&lt;/code&gt; feature is something that I like to have in my internal tools. Similarly to people working on open-source software, a platform team invest a lot of time helping their users understand issues they might experience. This is a built-in way to help users help themselves or, at least, give the platform team some more context on why a problem might be occurring.&lt;/p&gt;

&lt;p&gt;One similar but maybe more important verification is making sure that the user has an up-to-date version of the tools. This is part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;check&lt;/code&gt; feature discussed above, but I recommend that this important check should be part of every command.&lt;/p&gt;

&lt;p&gt;As an example, here is a failed attempt at creating an AWS account using the tool we built at Meetup. You can see that as part of its normal operation, the tool also queries an S3 bucket that contains the latest version number so that it can compare against its own version:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/cli-tools/cloudtoolsversioncheck.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ideally, the S3 bucket above should also inform what the minimum acceptable version is. If the current installation of the command-line tool is older than this version, it should reject any commands and ask the user to update the tool. If the current version is older than the most recent release but still higher than the minimal acceptable version, it should display a warning but still execute the command.&lt;/p&gt;
</description>
        <pubDate>Tue, 30 Jul 2019 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2019/07/30/developer_tools_principles.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2019/07/30/developer_tools_principles.html</guid>
        
        <category>Microservices</category>
        
        <category>DigitalOcean</category>
        
        <category>SoundCloud</category>
        
        <category>Distributed Systems</category>
        
        <category>Meetup</category>
        
        <category>Platform Engineering</category>
        
        <category>Cloud</category>
        
        
      </item>
    
      <item>
        <title>Some thoughts on GraphQL vs. BFF</title>
        <description>&lt;p&gt;The &lt;a href=&quot;/2015/09/18/the_back_end_for_front_end_pattern_bff.html&quot;&gt;Back-end for Front-end (BFF)&lt;/a&gt; Pattern was originated at SoundCloud.  It takes its name from the internal framework we built to make application-specific APIs easier to write and maintain. Since then, it has taken a life of its own, with various articles, books, and open source software that teach, discuss, or implement it.&lt;/p&gt;

&lt;p&gt;More recently, another approach to API architecture and design comes in the form of GraphQL. Facebook first developed the technology, and it has quickly become so popular that many startups were created exclusively to build frameworks and tooling around it.&lt;/p&gt;

&lt;p&gt;Over the past year or so, I have been asked many times about the relationship between these two. This article is a write-up of my thoughts on the matter.&lt;/p&gt;

&lt;h2 id=&quot;what-is-a-bff-even&quot;&gt;What is a BFF, even?&lt;/h2&gt;
&lt;p&gt;I believe that a lot of the questions people have around this topic originate from some misunderstanding of what is the &lt;em&gt;actual&lt;/em&gt; goal of the BFF pattern. There is a lot of detail on the background and specifics of the BFF pattern on &lt;a href=&quot;/2015/09/18/the_back_end_for_front_end_pattern_bff.html&quot;&gt;the original article describing it&lt;/a&gt;, but let me try to summarize what I mean by this term.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the diagram below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/bffgraphql/a-to-b.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Option &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(a)&lt;/code&gt; is sometimes called a &lt;em&gt;One-Size-Fits-All (OSFA) API&lt;/em&gt;, where we have one (or a few) APIs that serve many applications and use cases. Option &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(b)&lt;/code&gt; is generally called &lt;em&gt;BFF&lt;/em&gt;, where each application or sometimes use-case has its own API.&lt;/p&gt;

&lt;p&gt;In the OSFA model, we usually have many different applications (sometimes built by third-party developers and business partners) share the same endpoints. Every time that one of such endpoints need to be changed, the engineers from the &lt;em&gt;API Team&lt;/em&gt; need to make sure that they won’t break any important use cases, integrations, etc. Sometimes people try to go around this challenge by strictly versioning the APIs, but this not only imposes overhead in terms of governance but also won’t prevent you from running multiple versions of the API at the same time, until every client application is able to update their usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of trying to apply some strict and more formal governance process to deal with these challenges, with the BFF approach we try to eliminate the problem altogether by giving the team that owns the client applications full control over the API they use&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Putting it in terms from &lt;a href=&quot;https://martinfowler.com/ieeeSoftware/published.pdf&quot;&gt;a dichotomy proposed by Martin Fowler&lt;/a&gt;, using a BFF means that even if your API might be a &lt;em&gt;Public&lt;/em&gt; interface, it isn’t &lt;em&gt;Published&lt;/em&gt;. Even if other applications can reach the API—because it is available on the Internet—they are not supposed to do so and this usage isn’t supported by the API owner. Each application then is free to build and evolve their API as it better suits them, with no need to worry about how this would impact other client applications as there will be none.&lt;/p&gt;

&lt;p&gt;Something often overlooked when people talk about BFFs is that this new ownership model fundamentally changes the boundaries around your subsystems. In the OSFA approach, the API is a discrete subsystem meant to be used by multiple applications. In contrast, when you have an architecture based on BFFs, the API becomes part of the client application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The defining characteristic of a BFF is that the API used by a client application is part of said application, owned by the same team that owns it, and it is not meant to be used by any other applications or clients.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is an illustration from the original article:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/bffgraphql/bff-is-the-app.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;where-does-graphql-fit-in-all-this&quot;&gt;Where does GraphQL fit in all this?&lt;/h2&gt;
&lt;p&gt;Notice that &lt;strong&gt;there isn’t anything in the description above that says that the endpoints provided by a BFF must be optimized for the client application they now belong to&lt;/strong&gt;. There is no fundamental reason for the API exposed by one BFF to look any different from your typical OSFA API. Nevertheless, when you make the API part of the application, some coupling with the client is not only expected but desired, as teams use the autonomy as leverage.&lt;/p&gt;

&lt;p&gt;At SoundCloud, we saw teams using their newfound control over APIs to perform optimizations that made sense for their specific use cases. For example, the Android team experimented with ProtocolBuffers instead of JSON for their APIs payload, the partnerships team was able to allow for much more generous rate limiting settings for our the API used by the likes of Sonos and Apple, and various teams fine-tuned their caching and CDN usage to serve the particular needs better.&lt;/p&gt;

&lt;p&gt;So far, nothing discussed here prevents you from using any flavor of RPC you might prefer. You can follow the recipe above for REST, gRPC, GraphQL, SOAP, or any other combination of wire protocol and architectural style you might favor. Better yet, you can have each application using whatever technology suits them better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It follows then that it does not make much sense to compare BFFs and GraphQL. You can build your GraphQL APIs as many BFFs or as an OSFA API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I believe that the reason why people struggle with the relationship between these two related but not mutually exclusive concepts is due to one of the most interesting possibilities that BFFs give to client teams: how to optimize their endpoints and payloads.&lt;/p&gt;

&lt;p&gt;To recap, here is how the original article on BFFs explains the challenges teams faced with the OSFA approach we had at SoundCloud:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Below you can see how many requests we used to make in the monolithic days versus the number of those we make for the new web application:
&lt;img src=&quot;/img/2015-09-back-end-for-front-end-pattern/next-2013.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

  &lt;p&gt;To generate that single profile page, we would have to make many calls to different API endpoints, e.g.:&lt;/p&gt;

  &lt;ul&gt;
    &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GET /tracks/1234.json&lt;/code&gt; (the author of the track)&lt;/li&gt;
    &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GET /tracks/1234/related.json&lt;/code&gt; (the tracks to recommend as related)&lt;/li&gt;
    &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GET /users/86762.json&lt;/code&gt; (information about the track’s author)&lt;/li&gt;
    &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GET /users/me.json&lt;/code&gt; (information about the current user)&lt;/li&gt;
    &lt;li&gt;…&lt;/li&gt;
  &lt;/ul&gt;

  &lt;p&gt;…which the web application would then merge to create the user profile page. While this problem exists on all platforms, it was even worse for our growing mobile user base that often used unreliable and slow wireless networks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As we moved to BFFs and let client teams own their own APIs, they started working on ways to minimize the number of calls needed to do things
like render the user profile page mentioned above. Our architecture was heavily RESTful, and GraphQL wasn’t even available yet, so the way we dealt with the issue was to model the endpoints in our API following &lt;a href=&quot;https://martinfowler.com/eaaDev/PresentationModel.html&quot;&gt;a Design Pattern called Presentation Model&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When using this pattern, instead of assembling a page from many fine-grained calls to the API as described above, we would model user experience abstractions as their own REST resources. For example, we would have endpoints like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/track/123/player.json&lt;/code&gt; that returns all data needed to render &lt;a href=&quot;https://help.soundcloud.com/hc/en-us/articles/115003568008&quot;&gt;any of the multiple versions of our player&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/bffgraphql/player.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It still requires a page to make more than one call to fetch all data it needed to render the whole screen, but the number of requests needed was drastically reduced, from hundreds to a dozen, and the new endpoints were much easier to manage and reuse.&lt;/p&gt;

&lt;p&gt;Were GraphQL available back then and had we decided to use it, things would be quite different. In a RESTful API, the Presentation Model needs to be implemented on the server-side, so that we avoid making all those calls from the example above. When we use GraphQL, we don’t necessarily need a Presentation Model at all, and if we do use one, it can be implemented on the client application, as GraphQL makes it possible to get all data needed in a single request.&lt;/p&gt;

&lt;p&gt;One challenge in moving this responsibility back to the client is that it increases the amount of logic that you perform at this layer. It is notoriously hard to make sure that several feature teams are well-staffed when it comes to needs such as mobile development. This leads some organizations to prefer a strategy where they perform as much work server-side as possible, keeping the mobile clients simple and mostly dedicated to display logic. You might also find it difficult to push an urgent change when the deployment process for your app requires going through some kind of approval by an app store.&lt;/p&gt;

&lt;h2 id=&quot;do-we-even-need-bffs-with-graphql&quot;&gt;Do we even need BFFs with GraphQL?&lt;/h2&gt;
&lt;p&gt;But one more fundamental question that pops up when considering using GraphQL in BFFs is: &lt;em&gt;do we need BFFs at all&lt;/em&gt;? As discussed, BFFs are not about the shape of your endpoints, but about giving your client applications autonomy. Still, some GraphQL literature insists that this new technology gives so much freedom to the client by allowing them to perform ad-hoc queries that you can safely have a single OSFA API without the drawbacks from REST-based approaches.&lt;/p&gt;

&lt;p&gt;I don’t have enough first-hand experience with GraphQL at scale to have a strong opinion here, but two things about this worry me.&lt;/p&gt;

&lt;p&gt;The first friction point is that it is hard for me to believe that you can combine the needs of many different applications, owned by different teams, with different users and use cases, in a single schema. &lt;a href=&quot;https://medium.com/@__xuorig__/on-graphql-schema-stitching-api-gateways-5dcb579fa90f&quot;&gt;Marc-André Giroux, from Github, has a great article&lt;/a&gt; discussing the practical challenges of composing (“&lt;em&gt;stitching&lt;/em&gt;”) together schemas coming from different domains. Apollo has published some advanced tooling that aims at easing some of these challenges, but just by looking at this slide from &lt;a href=&quot;https://www.youtube.com/watch?v=Uw-Z1aUQvgg&quot;&gt;James Baxley’s excellent talk at GraphQL Conf 2019&lt;/a&gt; you can see that there are some non-trivial concepts that need to be applied:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/bffgraphql/apollo-schema-federation.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Even if someone comes with a simple technical solution for how to compose schemas, I am not sure that having a single schema is a good idea to begin with. Trying to derive a single schema that holds a complete-ish model of your data and can be queried by wildly different applications reminds me too much of an
&lt;em&gt;Enterprise Data Model&lt;/em&gt;, which enterprise software development was very fond of just a few decades ago.&lt;/p&gt;

&lt;p&gt;In this world, organizations would try to come up with one single database schema, often federated across many instances of Oracle and IBM relational databases, that would be the one source of truth for the whole company. Applications would be built around this enterprise schema, and there were documents that acted as &lt;em&gt;data dictionaries&lt;/em&gt;, explaining to developers what each field and type meant. &lt;a href=&quot;https://martinfowler.com/bliki/IntegrationDatabase.html&quot;&gt;Fowler wrote a few paragraphs on why these Integration Databases can be problematic&lt;/a&gt;, and I believe these same issues might arise when you have a single GraphQL schema for your API:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;An integration database needs a schema that takes all its client applications into account. The resulting schema is either more general, more complex or both - because it has to unify what should be separate BoundedContexts. The database usually is controlled by a separate organization to those that develop applications and database changes are more complex because they have to be negotiated between the database group and the various applications.&lt;/p&gt;

  &lt;p&gt;The benefit of this is that sharing data between applications does not require an extra layer of integration services on the applications. Any changes to data made in a single application are made available to all applications at the time of database commit - thus keeping the applications’ data use better synchronized.&lt;/p&gt;

  &lt;p&gt;On the whole, integration databases lead to serious problems because the database becomes a point of coupling between the applications that access it. This is usually a deep coupling that significantly increases the risk involved in changing those applications and making it harder to evolve them. As a result most software architects that I respect take the view that integration databases should be avoided.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am looking forward to reading more experience reports on both BFF and OSFA APIs built using GraphQL. At the moment, based on my own experience and what I see from folks like Marc-André Giroux, I suggest that an organization currently invested in RESTful BFFs keep their separate APIs and migrate them to GraphQL, instead of trying to jump to an OSFA GraphQL API.&lt;/p&gt;
</description>
        <pubDate>Fri, 12 Jul 2019 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2019/07/12/some_thoughts_graphql_bff.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2019/07/12/some_thoughts_graphql_bff.html</guid>
        
        <category>Microservices</category>
        
        <category>GraphQL</category>
        
        <category>Front-end</category>
        
        <category>Edge</category>
        
        <category>BFF</category>
        
        <category>Patterns</category>
        
        
      </item>
    
      <item>
        <title>A Structured RFC Process</title>
        <description>&lt;p&gt;Maybe you are a new engineering leader at &lt;em&gt;red-hot startup&lt;/em&gt;. The founders hired you on account of your previous experience at a successful tech company, they brought you in to take engineering to the next level. After a few weeks of onboarding, you now have a list of changes you want to implement. How do you find a way to propose that without making the &lt;em&gt;old guard&lt;/em&gt; feel alienated from the process?&lt;/p&gt;

&lt;p&gt;Or maybe you are part of the old guard yourself. You have shown interest in stepping up and leading the engineering team from a scrappy group of people working 7 days a week to a more mature organization. You were promoted to a position where you finally have the ability to tackle the root cause for the growing pains you all are experiencing. One question still remains, though: how can you make sure that your fellow engineers don’t feel that you are imposing your views like a tyrant?&lt;/p&gt;

&lt;p&gt;Or it could be that those ideas aren’t even yours. You are a manager worried about the amount of technical debt and frequent production incidents caused by people rushing to implement their ideas withouth having them double-checked by a second pair of eyes. When you casually remind them about the benefits of collaboration, you hear about how they are are afraid that a reviewer will waste everyone’s time pushing for the &lt;em&gt;perfect&lt;/em&gt; solution, and we need the first iteration of this thing out as soon as possible.&lt;/p&gt;

&lt;p&gt;In sitations like these, you are usually asking yourself &lt;em&gt;&lt;strong&gt;how can you foster a culture that is more accepting and kind towards change?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In my experience, one of the most effectives things one can do to achieve that is establish a structured process for feedback on ideas, designs, and architectures. Honoring a long tradition in software engineering, I call this a &lt;strong&gt;Structured Request For Comment (RFC) Process&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;introducing-an-rfc-process&quot;&gt;Introducing an RFC process&lt;/h2&gt;
&lt;p&gt;Your organization already has various formal and informal ways to share ideas, from formal presentations to casual chatter over lunch or beers. Something I have oserved in the various startups I have worked with is that thse channels tend to break down when the organization reaches something like 70-100 engineers. At this size, people still reach out for feedback from those who they know—for example people who have worked at the organization for a long time or maybe people who joined at the same time and bonded during onboarding—but these networks are more like &lt;em&gt;cliques&lt;/em&gt; than peer groups.&lt;/p&gt;

&lt;p&gt;This is when, as a leader in the engineering organization, I tend to establish the structured RFC process. RFC stands for &lt;em&gt;Request For Comments&lt;/em&gt;. The term has a long history in engineering, but outside formal standard bodies it is normally used to refer to a document describing and idea, written by someone who expects feedback on it from their peers. This kind of interaction happens all the time amongst engineers, but I believe that a well-defined and &lt;em&gt;structured&lt;/em&gt; process helps set expectations that is is an expected part of the engineering workflow. It also makes it easier for people to take part on the process, as they don’t have to second-guess if their opinions are welcomed or when to bother more senior people for feedback. Teams tend to use this process to gather feedback on the design for a new system, a strategy for upgrading shared libraries, new coding conventions, changes to the code review process, etc.&lt;/p&gt;

&lt;p&gt;After introducing such process in various startups, I have compiled the lessons that my teams and I have learned into an RFC itself, so that the team experience it first hand while discussing if they should adopt it or not. &lt;a href=&quot;https://docs.google.com/document/d/1ngXeK4e50xsYCearLfCBD7yoHFHLeU-BOkoWjx58TfY/edit&quot;&gt;You can find it in full as a Google Document here&lt;/a&gt;. Please feel free to copy this format, make whatever changes make sense and use it in your organization.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/rfc/rfc.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the process described above, the author writes a document describing the proposal, following a template that aims at making sure that some fundamental questions are answered before inviting people to give feedback. They will then ask other engineers for written feedback, usually by sending an email to a well-known mailing list. People reviewing the document provide the author with their opinion, anecdotes from previous experience, and facts related to the proposal. This feedback is considered &lt;em&gt;informational&lt;/em&gt;, meaning that the authors of the RFC are free to do incorporate it into their proposal or not.&lt;/p&gt;

&lt;p&gt;There is no guarantee that the feedback will be ultimately incorporated into the proposal, but we don’t want reviewers thinking that they have wasted their time commenting on it. That is why the process described here requires the authors to acknowledge every piece of feedback given. The authors must also commit to revisiting their final decision at some point in the future, sharing the lessons they have learned.&lt;/p&gt;

&lt;p&gt;We have recently introduced this process at Meetup, and in the first few weeks it was already clear that there was demand for something like it:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/rfc/rfcs-mailing-list.png&quot; alt=&quot;&quot; class=&quot;center-image&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The document above should contain everything you need to start a structured RFC process on your own. The remainder of this article is an annotated version of it, adding some nuance and historical background that isn’t fully captured in the RFC. It adds some color and background on the key points of this process based on my experience implementing it at ThoughtWorks, SoundCloud, DigitalOcean, and now Meetup.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-annotated-rfc&quot;&gt;The annotated RFC&lt;/h2&gt;

&lt;h3 id=&quot;the-header&quot;&gt;The Header&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;table&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;Authors:&lt;/td&gt;
        &lt;td&gt;Phil Calçado&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;To be reviewed by:&lt;/td&gt;
        &lt;td&gt;10/5/2018&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;Revisit Date:&lt;/td&gt;
        &lt;td&gt;04/17/2019&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;State:&lt;/td&gt;
        &lt;td&gt;Feedback Requested&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;A header like this might look antiquated, but I find still incredibly useful. At a glance, this provides me with who the authors are—important for accountability, which we will discuss later—and a few important dates to keep in mind.&lt;/p&gt;

&lt;h3 id=&quot;need&quot;&gt;Need&lt;/h3&gt;

&lt;p&gt;Most of this document was taken verbatim from the one we used to introduce the RFC process at Meetup. It is likely that your organization has some of the needs stated here, but you might want to be more specific about your needs.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A healthy engineering organization demands a culture of asking for and welcoming feedback on our work. In smaller organizations, sharing plans, designs, and decisions is much easier. As we grow, it has become clear that this organic process won’t suffice.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This paragraph acknowledges a common pain point in hyper-growth companies. When your team was small, there was a straightforward way to share ideas between engineering, product, and even founders—just have a conversation! As you hire more people, suddenly engineers find themselves with a feeling that we can summarize as &lt;em&gt;“I don’t know what’s going on anymore.”&lt;/em&gt; While RFCs won’t solve all of your problems, it establishes a well-defined process to share and consume information about engineering decisions and ideas.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Currently, various teams already write down their plans and designs in documents that could be usually called an RFCs (Request for Comments). Without shared and clear guidance or process, these vary drastically in format, contents, and objectives. There is also a lot of variance on how these are advertised to other engineers who would be good candidates for feedback givers. At the moment, there is no easy way for an engineer to know what topics are being discussed at a given time, or how could they give input on such decisions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your team is already sharing information one way or another. Unfortunately, the lack of a standard for how and where to ask for and give feedback makes it such that these documents often don’t reach people who would be the most helpful or impacted by it in a timely fashion.&lt;/p&gt;

&lt;p&gt;At Meetup, for example, our Web Architecture team was planning to build a GraphQL-based API to boost mobile productivity. They had a meeting with our mobile team to share the good news and talk about the project; the expectation as that the mobile team would &lt;em&gt;adore&lt;/em&gt; the idea. Instead, the GraphQL proposal was received with confusion voiced as questions like &lt;em&gt;“So.. does that mean we should stop our refactoring of the HTTP clients?”&lt;/em&gt; It turns out that the mobile engineers had decided to solve the productivity problem themselves by changing their HTTP client to make it super-productive to use our REST API. They had done some amazing thinking about how to improve the current state of things, and even shared the idea as an RFC. Unfortunately, this RFC was never shared with any other team, and people who own the API platform had no idea that this initiative was going on.&lt;/p&gt;
&lt;blockquote&gt;

  &lt;p&gt;There is also no clarity on collaboration versus decision-making. An RFC process, by definition, is meant to collect feedback on a proposal. There will always be different opinions, and we must encourage people to expose their ideas and have them debated. Nevertheless, we operate in a very competitive landscape and we have no time to waste in analysis paralysis. We believe that speed of iteration beats quality of iteration, and to iterate quickly we absolute clarity about who has the decision-making responsibility on a proposal.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;One of the most important aspects of any change management process, especially when trying to increase transparency and engagement, is to avoid &lt;a href=&quot;https://en.wikipedia.org/wiki/Design_by_committee&quot;&gt;design by committee&lt;/a&gt;. Whatever feedback gathering process you end up adopting, you must make sure that there is an explicit acknowledgment of who is the decision maker, the one accountable for the outcome and with veto power over it. Feedback givers must always keep in mind that their opinion will be taken in consideration, but there is no guarantee that they will be incorporated into the proposal.&lt;/p&gt;

&lt;blockquote&gt;

  &lt;p&gt;Moreover, existing RFCs and similar documents often get into too much detail about the “how” and not enough on the “what” of the proposal. There are many different ways to materialize an idea, and implementation details are better left to be decided by those who are actually doing the work.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;You want to both gather feedback from a diverse audience and make sure that the reviewers aren’t missing the big picture and focusing on implementation details. To achieve that, you need to make sure that your document doesn’t spend too  much time on distractions and focuses on the most important aspects of the proposal. When I am coaching managers and leaders, I tend to summarize this as &lt;em&gt;don’t invite people to conversations you don’t want to have with them&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;One example of this going bad was when, at SoundCloud, my Platform team published an RFC describing the changes that application developers would face as we moved from our own datacenters to the cloud. The document was full of important and potentially contentious information about how application developers would have to change their mindset about latency, availability, and even simple things like trusting that there was a durable file system in their servers. Nevertheless, the one paragraph that everyone in the company decided to comment on was one that causally mentioned that we would write some tooling in Python because that is the de-facto canonical AWS SDK. This was an implementation detail, completely irrelevant to anyone who wasn’t in that team. Still, they were introducing a new programming language and that sparked a heated debate that went on over the weekend. At SoundCloud, our teams had autonomy to decide whatever tools made sense to them, and the mistake this team made in the RFC as to invite an engineering department full of very opinionated people to give feedback on their programming language preferences.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;In summary, we need a clear and simple process that allows people to share their ideas, receive feedback on them, and defines how the decision-making process works.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sentence summarizes everything that this process tries to address. People need a safe space to get feedback on their ideas, and feedback givers must know how their input will be used.&lt;/p&gt;

&lt;h3 id=&quot;approach&quot;&gt;Approach&lt;/h3&gt;

&lt;blockquote&gt;

  &lt;p&gt;The recommended approach to fulfill the needs presented in the previous section is a structured RFC process. In this document, a person or group of people will author a document describing a proposal and asking for feedback on it from the rest of the organization.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;As mentioned before, in my own experience a well-managed RFC process can address the needs stated previously. The trick here is that the term &lt;em&gt;Request for Comments&lt;/em&gt; means different things to different people. To make it clear what we mean by ot, this section tries to be prescriptive and opinionated about how to build such a process.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;feedback-vs-approval&quot;&gt;Feedback vs. Approval&lt;/h4&gt;
  &lt;p&gt;The RFC process is a tool that can be used during the decision-making process, and everyone is encouraged to share rough and early ideas and proposals as RFCs.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;The more polished a document looks, the &lt;em&gt;softer&lt;/em&gt; and less impactful reviews tend to be. When facing a well-written document, our brains enter into a &lt;a href=&quot;https://en.wikipedia.org/wiki/Sunk_cost&quot;&gt;&lt;em&gt;sunk cost fallacy&lt;/em&gt;&lt;/a&gt; mindset, thinking &lt;em&gt;“Ugh, I think this is a horrible idea, but this person has put so much effort into it…“&lt;/em&gt;. This leads us to focus on smaller, irrelevant details instead of addressing any elephants in the room. It is generally easier to give more candid and useful feedback on something on its earlier stages, maybe just a list of bullet points and a back-of-the-napkin drawing.&lt;/p&gt;

&lt;p&gt;As a leader, it is probably common for people to share with you their plans and ideas over Slack, email, or meetings. When this happens to me, I spend some time listening and asking some preliminary questions about the proposal, but soon enough I say &lt;em&gt;“This sounds interesting, do you mind putting it in a two-page document using the RFC format?”&lt;/em&gt;. I tend to work with them for a few iterations on it, antecipating questions that I believe will come from the wider audience, and then ask them to send it to the wider group for peer feedback. If the person is resistant to sharing it widely, I coach them into sharing it first with people they are more comfortable with, and widening the circle until the whole organization is engaged.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Even if the proposed change on the RFC is extremely well-received, it doesn’t mean that it is approved to be worked on or that it will be prioritized. Authors of the RFC must make sure that they have whatever approval or sponsorship they need from management, leadership, stakeholders, collaborators, and their own team before any actual work is done.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The biggest caveat of a peer review process like the one described here is that just because something has gotten good feedback and people may be super-excited to see the change implemented, it doesn’t mean that it is the right thing to do, or that it should be prioritized.&lt;/p&gt;

&lt;p&gt;This is where the push for everyone to use RFCs and to publish early work can backfire. It is not uncommon for engineers to try and use the process as a way to sell an idea that hasn’t been approved by their stakeholders or managers. They try to gather support from the other members of staff, transforming something that should be a purely technical matter into a popularity contest.&lt;/p&gt;

&lt;p&gt;That is why one needs to make it absolutely clear that RFCs are not a decision-making process. RFCs are merely for feedback on a proposal, and there is no commitment that a well-received RFC will be implemented or that a poorly received one won’t.&lt;/p&gt;

&lt;p&gt;Just like with any other engineering problem, it is also helpful to be explicit about any constraints before asking people to find a solution. One way in which I have done this in the past is taking responsibility for writing the &lt;em&gt;Need&lt;/em&gt; section of the RFC. You should use that as an opportunity to make sure that not only the technical and functional aspects of what is needed are expressed there, but also explicit acknowledgment of the other constraints one is under. For example, you should make it clear that the desired solution needs to be delivered within a given timeframe, or under some budget.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;RFCs are expected for any change that extends beyond a team or department, as it gives the people who would be affected an opportunity to learn more about the change and give feedback.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thoughout the document, I try to reiterate over and over the idea that RFCs should be used at any time whenever people can benefit from feedback on an idea. This section takes a more prescriptive stance, explicitly setting the expectation that RFCs will be used when a change impacts more than just a team or any other cohesive group of people. This draws a line on what autonomy means in practice, setting a safeguard that is triggered when a team’s decision might impact other individuals.&lt;/p&gt;

&lt;p&gt;In my experience, enforcing this rule is seldom necessary. In fact, it is more common that the problem is other around: is not that people need to be told when to write an RFC, they need coaching identifying when this is &lt;em&gt;not&lt;/em&gt; the best course of action.&lt;/p&gt;

&lt;p&gt;This might sound conter-intuitive. If we are so convinced that the RFC process brings value to the organization, why don’t we want to have RFCs for almost everything? In my experience, there are two main problems to this approach.&lt;/p&gt;

&lt;p&gt;The first problem is that it can create an avalanche of RFCs that spam our inboxes. As discussed in the introduction, the RFC process here won’t scale very well if there are too many changes to be reviewed at a given point in time. People get overwhelmed and will quickly disengage from the process.&lt;/p&gt;

&lt;p&gt;A second and more dangerous problem I have observed is something that can be mapped to the phenomenon called &lt;a href=&quot;https://en.wikipedia.org/wiki/Diffusion_of_responsibility&quot;&gt;diffusion of responsibility&lt;/a&gt;. That is when engineers start using the RFC process as means to protect themselves from any bad consequences. &lt;em&gt;“Everybody reviewed it and gave their ‘ok’“&lt;/em&gt; feels like an efficient shield to use when asked hard questions. Autonomy doesn’t work without accountability, and if your engineers are using RFCs as an ass-covering tool you probably need to revisit how your culture deals with failure.&lt;/p&gt;

&lt;p&gt;One way to tackle this problem is with coaching. I expect my technical leadership, e.g. tech leads, architects, Staff/Principal engineers, etc., to invest a lot of their time in reviewing and helping prepare RFCs. To me, engineering leaders do their job when they are helping others with their RFCs like this, not when they are writing RFCs themselves.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;authorship-accountability-and-responsibility&quot;&gt;Authorship, Accountability, and Responsibility&lt;/h4&gt;
  &lt;p&gt;The authors of an RFC can be an individual, team, or any other group of people. Being an author means that a person or team sponsors the initiative and are accountable for it.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;I usually ask teams to assume &lt;a href=&quot;https://martinfowler.com/bliki/CodeOwnership.html&quot;&gt;collective ownership&lt;/a&gt; of the RFCs they produce. While it’s normal or one person or maybe a pair to take the lead on responding to feedback and managing the process, the ownership of an RFC should be treated the same way as they do with code. Every now and then an RFC would be owned by a single person, but this shouldn’t be the norm.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;While non-authors may be responsible for implementing the results of an RFC, its authors are accountable for it, as per the definitions below:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;blockquote&gt;
    &lt;p&gt;The main difference between responsibility and accountability is that responsibility can be shared while accountability cannot. Being accountable not only means being responsible for something but also ultimately being answerable for your actions. Also, accountability is something you hold a person to only after a task is done or not done. Responsibility can be before and/or after a task.&lt;/p&gt;
  &lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;This section makes it explicit that, while the authors might not be the ones actually doing the work of implementing the change, they are accountable for making sure that the RFC process is executed well and, more importantly, for the change being proposed.&lt;/p&gt;

&lt;p&gt;The concepts of accountability and responsibility are fundamental to a healthy organization and deserve their own article. If your organization hasn’t yet developed a good understanding of what these terms mean, you migh want to expand this section and include some more details.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;collaboration&quot;&gt;Collaboration&lt;/h4&gt;
  &lt;p&gt;RFCs must be sent to a mailing list called rfcs@example.org. All engineers are automatically part of this list, and people from other groups are welcome to join and participate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These days many organizations are trying to completely switch to real-time communication tools like Slack. I personally prefer an asynchropnous tool, such as email, for the RFC process. I have also seen teams using Github issues and wiki pages for this.&lt;/p&gt;

&lt;blockquote&gt;

  &lt;p&gt;Comments and feedback should focus on the technical content. As long as they don’t impact the content, collaborators should avoid commenting on formatting, writing style and other maybe relevant, but not critical aspects. Such comments can be sent directly to the author to avoid polluting the comment and storming people with notifications.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;As a Brazilian citizen who has been working in English-speaking environments for more than ten years, I know first-hand the challenges of having English as a Second Language. While we always welcome feedback as a way to get better at expressing ourselves, and RFC isn’t the best forum for it. It is perfectly fine to ask authors and feedback givers to rephrase a sentence that is a little confusing, but please refrain from using this interaction as a way to find &lt;em&gt;teaching moments&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Similarly, don’t be obsessed with formatting. It is great when RFCs look the same, it makes it easy to quickly parse and check if you shoukld invest time on it, but it isn’t mandatory.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Authors must address all comments written by the deadline. This doesn’t mean every comment and suggestion must be accepted and incorporated, but they must be carefully read and responded to. Comments written after the deadline may be addressed by the author, but they should be considered as a lower priority.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;This goes back to the our desire to make sure that people who have invested their time inr eading and commenting on the document don’t feel like they have wasted their time and that their opinions aren’t even going to be taken into consideration.&lt;/p&gt;

&lt;p&gt;Something to be aware of is that, in my experience, &lt;a href=&quot;https://twitter.com/pcalcado/status/1088817864491057154&quot;&gt;platforms with in-line commenting such as Google Docs or Github Pull Requests can create a habit of commentiong-as-you go&lt;/a&gt;. This can be extremely annoying to RFC authors, as they keep receiving notifications and scriolling through comments that are answered in the document itself if the reviewer just read a few paragraphs more. There are a few technology options that can help with this, such as Github Reviews, but to me this is a behavior better addressed by feedback and coaching.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Every RFC has a lifecycle. The lifecycle has the following phases:&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Draft: The authors are working on the RFC before asking for wider feedback&lt;/li&gt;
    &lt;li&gt;Feedback Requested: The RFC has been sent to the mailing list is waiting for feedback from stakeholders&lt;/li&gt;
    &lt;li&gt;Active: The deadline for comments on this RFC has passed and the authors have decided to go ahead with it&lt;/li&gt;
    &lt;li&gt;Abandoned: The authors have decided not to move forward with the changes proposed in this RFC.&lt;/li&gt;
    &lt;li&gt;Retired: The changes proposed on this RFC aren’t in effect anymore, the document is kept for historical purposes&lt;/li&gt;
  &lt;/ul&gt;

&lt;/blockquote&gt;

&lt;p&gt;The lifecycle of an RFC is meant as a tool that people can use to enforce a window in which feedback is expected and create a discrete point when the authors can say &lt;em&gt;“Thanks everyone”&lt;/em&gt; and move on, either implementing the changes or deciding that it wasn’t such a great idea.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;draft&lt;/em&gt; stage is aimed at creating a safe space for people to gather early feedback on an idea. As mentioned before, engineers can be really resistant to sharing half-baked thoughts until they can defend their opinions and designs from criticism, and this might take a long time. People seem generally more comfortable with sharing something in its early stages if is clearly marked as a draft, though, and this can lead to faster feedback cycles.&lt;/p&gt;

&lt;p&gt;I would generally recommend that once an RFC moves away from &lt;em&gt;Feedback requested&lt;/em&gt;, it is considered a historical artifact, if not discarded completely. RFCs aren’t great as documentation, once the feedback period is over I usually ask the authors to document any relevant parts somewhere else like a wiki or even a different Google Doc.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Each RFC has a revisit date, by when the authors will update the mailing list on what they have learned since the feedback phase. This is a natural point for an RFC to be retired and a new approach proposed.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;The most important lesson that I have learned as a change agent in organizations is that people are much more welcoming to change if they know that the decisions and assumptions will be revisited at some point in the future.&lt;/p&gt;

&lt;p&gt;I love the way that Linda Rising describes this as a Pattern in her great book &lt;a href=&quot;https://amzn.to/2Ges2VV&quot;&gt;Fearless Change: Patterns for Introducing New Ideas&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You’re getting worn out as you attempt to address the concerns people have about the new idea because it doesn’t look like the questions and objections are going to end anytime soon.&lt;/p&gt;

  &lt;p&gt;&lt;strong&gt;There are people in the organization who are expressing an endless supply of objections to the new idea. It would be a daunting, or even impossible, task to try to ease everyone’s worries before the new idea is adopted.&lt;/strong&gt;&lt;/p&gt;

  &lt;p&gt;Fear is often what keeps us talking and questioning but stops us from doing anything. However, even though people may be fearful of change, they usually love to experiment. Change means risk. An experiment is something you can undo and walk away from when you are all the wiser.&lt;/p&gt;

  &lt;p&gt;Ideas that can be tested on an installment plan are generally adopted more rapidly than those that are not. If people are offered a trial period, they will have the opportunity to experiment with the innovation under their own conditions. This is likely to ease their uncertainties and give meaning to something that was previously seen as only an abstract idea.&lt;/p&gt;

  &lt;p&gt;It’s more effective to let people convince themselves through sight and touch than to try to convince them with words and logic. For “test purposes” is a convenient label for temporarily transferring “unacceptable” ideas into an “acceptable” category, until such time that the idea can gain the persuasive power to become part of the established way of doing things.&lt;/p&gt;

  &lt;p&gt;&lt;em&gt;Therefore:&lt;/em&gt;&lt;/p&gt;

  &lt;p&gt;&lt;strong&gt;Suggest that the organization, or a segment of the organization, try the new idea for a limited period as an experiment.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Having an expiry date and a commitment from the authors to revisit the decision is one way to implement this Pattern in your organization.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;format&quot;&gt;Format&lt;/h4&gt;
  &lt;p&gt;The RFC document itself is where comments and decisions are recorded. It should be a Google Doc, and everyone should have access rights to comment on it.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;As mentioned previously, this process can be implemented using various publishing software, from Google Docs to Github Pull Requests. I personally like the idea of Google Docs because it makes it easier to apply the same RFC process outside engineering. Say, for example, that you want to propose a change on the job description for engineers in your organization. If you use a tool familiar to your HR folks you can keep the conversation in a single document, instead of having to translate back-and-forth between what engineers are giving feedback on and an endless email thread with your People Team.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A good RFC will describe the scope and the approach. It should not contain a list of specific tasks or project plan.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;This is a soft requirement, trying once more to reiterate that the &lt;em&gt;what&lt;/em&gt; is often more important than the &lt;em&gt;how&lt;/em&gt; for an RFC. It is perfectly fine to ask for feedback on a project plan, though, but I would suggest that the authors try and divorce the feedback on the objectives from the discussion about the project plan, it schedule, staffing and resources—the latter should derive from the former once that is established.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;To avoid overloading the document with implementation details, RFCs should follow the Stanford Research Institute’s NABC model, making sure that they cover four points:&lt;/p&gt;
  &lt;blockquote&gt;
    &lt;p&gt;An NABC comprises the four fundamentals that define a project’s value proposition:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;Need: What are our client’s needs? A need should relate to an important and specific client or market opportunity, with market size and end customers clearly stated. With DARPA, for example, we are required to state a critical Department of Defense (DoD) need. The market should be large enough to merit the necessary investment and development time.&lt;/li&gt;
      &lt;li&gt;Approach: What is our compelling solution to the specific client need? Draw it, simulate it or make a mockup to help convey your vision. As the approach develops through iterations, it becomes a full proposal or business plan, which can include market positioning, cost, staffing, partnering, deliverables, a timetable and intellectual property (IP) protection. If we are developing a product, it must also include product specifications, manufacturing, distribution and sales. DARPA usually demands paradigm-shifting approaches that address a specific DoD need (e.g., a 10-times improvement).&lt;/li&gt;
      &lt;li&gt;Benefits: What are the client benefits of our approach? Each approach to a client’s need results in unique client benefits, such as low cost, high performance or quick response. At DARPA, the benefit might be an airplane that turns faster, goes higher, costs less or is safer. Success requires that the benefits be quantitative and substantially better - not just different. Why must we win?&lt;/li&gt;
      &lt;li&gt;Competition/alternatives: Why are our benefits significantly better than the competition? Everyone has alternatives. We must be able to tell our client or partner why our solution represents the best value. To do this, we must clearly understand our competition and our client’s alternatives. For a commercial customer, access to important IP is often a persuasive reason to work with us. At DARPA, our competition is usually other research laboratories and universities across the United States. But, whether to a commercial or government client, we must be able to clearly state why our approach is substantially better than that of the competition. Our answer should be short and memorable.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;The NABC format was introduced to SoundCloud by &lt;a href=&quot;https://twitter.com/gavinmbell?lang=en&quot;&gt;Gavin Bell&lt;/a&gt;, who has learned about it during his time on research labs. I was very skeptical of it at first, but nowadays it is my go-to format for proposals.&lt;/p&gt;

&lt;p&gt;One of my favorite features of this model is the requirement that authors give some thought to alternatives and how they compare to the proposal. Something I like to enforce myself is that every RFC must consider the alternative of doing nothing. Every change requires investment of time, energy, and resources, and before implementing anything new we should consider what happens if we don’t do anything a all.&lt;/p&gt;

&lt;h3 id=&quot;benefits&quot;&gt;Benefits&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;The first significant benefit of the approach described above is making a clearer distinction between decision-making and feedback gathering. With a clearly appointed accountable team, we can create a disagree and commit culture. We will carefully hear all positions and reckons from everyone, but ultimately a decision will be made by a specific person or team. Once the decision is made, everybody, irrespective of any differences in opinion during the RFC process, will commit to implementing and championing the decision. If it turns out that the decision wasn’t a good one, the revisit date on the RFC is there to make sure another discussion will be held in the near future.&lt;/p&gt;

  &lt;p&gt;Another important benefit of the proposed RFC process is openness. We have fantastic engineers, and we need to use our collective knowledge as leverage. None of us is as smart as all of us. To make collaboration work, we need to make it easy for all engineers to see what RFCs are being proposed and we need to make it a safe environment to collaborate, where comments focus on factual benefits and tradeoffs.&lt;/p&gt;

  &lt;p&gt;The NABC format is an industry tool used for making structured ‘pitches’. Using this tool will likely lead us to discuss the what without losing ourselves in the ocean of technical detail.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;These paragraphs summarizes a lot of what I have discussed in this annotated version, but in a concise way aimed at the reviewer. I find myself referring back to it a lot when people start off-topic discussions on RFCs.&lt;/p&gt;

&lt;h3 id=&quot;competition-or-alternatives&quot;&gt;Competition (or Alternatives)&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;do-nothing&quot;&gt;Do Nothing&lt;/h4&gt;
  &lt;p&gt;We should consider the option of not making any change and keeping the ad-hoc model we currently have for RFCs.&lt;/p&gt;

  &lt;p&gt;The main issues with this option were described in the Need section of this document. Unless something changes, the problems there will remain.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;em&gt;“Do Nothing”&lt;/em&gt; option for the RFC process is highly contextual, but something that I believe most organizations will face is that, in order to keep communication manageable, people will either communicate in silos or stop discussing their ideas altogether. Even if you don’t adopt this process in particular, you should consider implementing an alternative.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;adopt-ieee-rfc-model-as-is&quot;&gt;Adopt IEEE RFC Model as-is&lt;/h4&gt;
  &lt;p&gt;Although any collaborative development process will have feedback as a core component, the name RFC was made popular by the process used by the IETF to document fundamental standards for what eventually became the Internet. We could follow the IETF RFC model, and maybe even require authors to use terms like MUST, SHOULD, and MAY as formally specified by RFC2119 to avoid ambiguity.&lt;/p&gt;

  &lt;p&gt;The main reason to avoid this style is that IETF RFCs have evolved into “the Internet documents of record”, containing “very detailed technical information” about standards that browser vendors and network middleware need to implement. These documents will impact the whole industry and hence warrant  a complex publishing workflow. The process we propose in this document, on the other hand, is about putting forward an idea as early as possible and receiving feedback on it by a wide audience. With this goal in mind, a less formal process like the one described here is preferred.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’ve always been fascinated with RFC2119, so much that more than ten years ago &lt;a href=&quot;https://github.com/pcalcado/rtfspec&quot;&gt;I used it as a model when writing one of the first unit testing frameworks available for Clojure&lt;/a&gt;. After a few attempts at using the “official” RFC framework, though, I have found that even if you simplify the workflow it is very hard for people to be productive when bounded by it. Moreover, there is often no need for an RFC to be so precise, the authors are often the only people implementing the change and reviewers will benefit more from a more fluid, conversational prose than focusing on strict use of keywords.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;h4 id=&quot;use-architecture-decision-record&quot;&gt;Use Architecture Decision Record&lt;/h4&gt;
  &lt;p&gt;Michael Nygard published a model to document and manage change in software architecture called Architecture Decision Record (ADR). Its motivation, format, and lifecycle are very similar to what this document proposes.&lt;/p&gt;

  &lt;p&gt;Nygard’s model is specialized in software architecture work. This is reflected in its usage of engineering tools such as repositories and Markdown files, which only make sense in a software project. We want the RFC process to be a tool useful in areas other than software development, which makes harder to implement some of the more specialized areas of the process. Nevertheless, ADRs can be used together with the RFC process described here when developing software systems.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;I haven’t personally used ADR as proposed by Michael Nygard, and I am very interested in hearing experience reports from folks who have tried it. At the moment, I am not convinced that it is a good replacement for the RFC process described here. People often bring it up when reviewing the RFC process, though, so I wanted to address it from the beginning.&lt;/p&gt;

&lt;h4 id=&quot;acknowledgments&quot;&gt;Acknowledgments&lt;/h4&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/etelsverdlov&quot;&gt;Etel Sverdlov&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/pellegrino&quot;&gt;Vitor Pellegrino&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/muanis&quot;&gt;José Muanis&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/marzagao&quot;&gt;Thompson Marzagão&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/dtsato&quot;&gt;Danilo Sato&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/qmx&quot;&gt;Douglas Campos&lt;/a&gt;, and &lt;a href=&quot;https://vinibaggio.net/&quot;&gt;Vinícius Baggio Fuentes&lt;/a&gt; gave feedback on drafts of this article.&lt;/p&gt;

&lt;h4 id=&quot;revision-history&quot;&gt;Revision History&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;11/20/2018 - First published&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Mon, 19 Nov 2018 00:00:00 +0000</pubDate>
        <link>http://philcalcado.com/2018/11/19/a_structured_rfc_process.html</link>
        <guid isPermaLink="true">http://philcalcado.com/2018/11/19/a_structured_rfc_process.html</guid>
        
        <category>RFCs</category>
        
        <category>Change Management</category>
        
        <category>Engineering Management</category>
        
        <category>Meetup</category>
        
        
      </item>
    
  </channel>
</rss>
