Codesmithing

Notes on the future of AI-generated software.

Apr 24, 2023

What follows from the fact that LLMs are very good at writing code?

The first applications that took advantage of this were coding assistants like Github Copilot and Replit’s Ghostwriter. These assistants have proven remarkably skilled at code completion and are quickly expanding into other modes of support like answering technical questions, suggesting bug fixes etc…

While I think of the first chapter of code-intelligent AI work as a story about developer productivity, I’m starting to think that the next chapter is about AI-based tools using their ability to program to solve higher order problems. Not just generating code but running it.

I see this playing out in two important ways.

The first is that it’s going to become much easier for end users to create their own software. The second is that AI agents will be able to satisfy a surprisingly wide range of user requests.

LLMs are the holy grail of end-user programming

Geoffrey Litt wrote an excellent piece recently on what AI’s talent for writing code might mean for regular users. His basic view is that LLMs being able to write code means that tools are now malleable in a way that might live up to the wildest ideals of end user programming–that users will be able to mold software without having to resort to the complexity of traditional programming techniques.

I’d like to unpack what this might look like across a few different forms of software that end-users might build for themselves.

Scripting and automation

A few exciting areas where we’ll see LLM-assisted scripting play out:

Zapier-style automations.

It’s easy to imagine that the next generation of workflow automation will either replace or simply augment traditional workflow-builders with natural language.

Imagine this description for a simple automation: “I want to send a slack message to #wins when we close new opportunities.”

From this description, the LLM should be able to deduce the intended logic: that when the stage property of a Salesforce Opportunity record moves to “closed”, a slack message should be sent to the channel with the name “wins”.

Here’s a different example that shows how even complex automations might become much easier to build.

“I want to send a slack message to the relevant account manager when a customer sends a help message in Zendesk”....

Natural language compresses complex ideas. While this description is easy enough to articulate, the logic required to make it happen is actually kind of convoluted. Setting up this automation requires deconstructing “relevant account manager” into precise logic and some cleverness in making the association between customer records in Zendesk and Salesforce.

Admittedly, it’s possible that the LLM doesn’t precisely translate this description into the proper logic. But I imagine that natural language will at least be a useful starting point for automations that are easy to describe, but complex to construct. My hunch is that the product team at Zapier (if not someone else) figures out how to leverage natural language input for much simpler workflow construction.

Natural language interfaces for applications with their own custom languages. Application-specific languages are employed across a wide spectrum of applications already: formulas in spreadsheets, codes in GDS systems for travel agents, SQL for data analysis etc…

While these application-specific languages augment the workflows of power users, they often correlate with a higher learning curve for new and less sophisticated users.

My sense is that LLM-powered assistants will emerge in these applications, making it easier for end-users to achieve the same kind of control over their tools as power users. Early examples include:

Command K + GPT in Equals.
Next-gen developer terminals like Fig and Warp.
Notebook tools like Hex supporting natural language input.
SQL-oriented assistant in Retool.

Extensibility

Customers often need bespoke solutions. The way that software providers accommodate these custom needs is by making their tools extensible. APIs for programmatic use or end-user facing plugins are common approaches here.

The problem with extensibility is that it can be hard to take advantage of. Salesforce for instance is intensely configurable, but requires a fair degree of specialized knowledge that most users lack.

My sense is that with LLM-assisted coding, end-users will be able to build custom workflows and UI in Salesforce without needing to consult Salesforce Admins or possessing any expertise in Apex code.

I can see this playing out similarly with all sorts of other extensible systems: think browser extensions, plugins for Google Docs or blocks in Notion etc…

I wonder if, as extensibility becomes easier, users reveal a preference for apps that are best suited for it. My sense is that they will and that a long term side effect is that developers treat extensibility as a priority.

The ability to code amplifies the abilities of AI agents.

How are we to make sense of something like this demo from Greg Brockman which features the hard-to-believe phenomenon of an agent writing a script and executing it in order to satisfy a user’s request (in this case, to pull out a 5 second clip from a video)? Is it end-user programming? Or is it just an AI following orders?

One-off scripting:

The most obvious manifestation of agent programming is basically the same kind of end-user programming we discussed earlier except that the context of the request is task delegation rather than software creation. As I’ve noted, the line between these things is blurry, but here are a few examples of what I have in mind:

“I want a list of links that come from tweets I’ve liked in the last week. Please add them as a new sub-page inside of my notion reading directory.”

“Generate a list of every person I met with this week.”

“Generate a list of everyone I haven’t responded to in Slack or Gmail within the last week.”

Each of these requires some combination of reasoning about a problem and relevant API integrations. As long as agents are equipped with the appropriate credentials, these kinds of requests should be doable.

Autonomous agents in the background.

I’m excited by the possibility of agents residing in the background, observing the state of things, and engaging autonomously to advance our goals.

From a hackathon a few months ago, this project from Alistair Pullen contemplates an AI bug fixer that roams Github Issues and will work to generate pull requests that attempt to resolve them.

I am confident we’ll see this pattern applied to other developer contexts. We might see new kinds of agent-intervening workflows in task management applications. You could imagine an agent volunteering to work on a Linear task that it thinks it is capable of completing. Or suggesting ways of breaking down a Linear task into smaller tasks that it is suited to work on….

What are the contexts that might provoke new development work? I suspect that we’ll see autonomous software-building agents planting themselves in those contexts, waiting for new work to take on for themselves.

Agents use code to improve themselves.

Sometimes agents find that they don’t have the tools they need to solve a user request, as in this highly amusing example provided by Sam Whitmore.

One possible remedy for agents that find themselves otherwise ill-equipped is to build their own tools. Already, we’ve seen a few projects that are pointed in this direction.

My favorite of this genre is Daniel Gross’s LlamaAcademy. The idea behind LlamaAcademy is that LLMs can potentially learn to use APIs just by reading through documentation. We can see a horizon where AI agents are capable of integrating into any API as long as the agent is capable of retrieving the relevant documentation (and critically, has access to user credentials).

Taking codesmithing to production

Agents being able to execute code they’ve written themselves is an exciting development that I believe will transform the way we interact with software.

Nevertheless, there are a few issues (beyond any AI safety concerns) that I anticipate will need to be worked out before codesmithing is production-ready.

Access to API credentials. So much of the potential for AI agents writing code has to do with API integrations. As I’ve surveyed, there is a great deal of work that points at the possibility of AI being able to write effective integrations. One limiting factor for making this code actually runnable though is access to user credentials. In a world where users have their fair share of available AI agents, I expect that privileged access to user credentials will become strategically important.

Safe execution of arbitrary code. The production of agent-generated code will pose some measure of security risk. I expect that even as companies are incentivized to make their systems more extensible, they will need to take responsible precautions around the execution of arbitrary code.

If this piece resonates or you’re building something in this space, feel free to DM me on Twitter. I’d love to hear from you.

Peter's Newsletter

Discussion about this post