2025-06-22 08:00:00
One of the most common questions people ask my personal AI, Ask LukeW, is "how did you build this?" While I've written a lot about the high level architecture and product design details of the service, I never published a more technical overview. Doing so highlighted enough interesting generative publishing ideas that I decided to share a bit about the process.
First of all, Ask LukeW makes use of the thousands of articles I've written over the years to answer people's questions about digital product design. Yes, that's a lot of writing but it's not enough to capture all the things I've learned over the past 30 years. Which means sometimes people Ask LukeW questions that I can answer but haven't written about.
In the admin system I built for Ask LukeW, I can not only see the questions that don't get answered well but I can also add content to answer them better in the future. Over the last two years, I've added about 500 answers and thereby expanded the corpus Ask LukeW can respond from by a lot. So the next time similar questions get asked, people aren't left without answers.
That process is an interesting part of generative publishing that I've written about before but it's also how I know that people regularly ask how I built Ask LukeW. they want technical details: what frameworks, what models, what services. I never wrote this up because I'm not that technical and several great engineers helped me build Ask LukeW. As a result, I didn't think I'd do a great job detailing the technical aspect of things.
But one day it occurred to me I could use our AI for code company, Augment Code, which has a deep contextual understanding of codebases to help me write up how Ask LukeW works. I opened the codebase in VS Code and asked Augment the questions people asked me: "how does the feature work?" "what is the codebase?" "what is the tech stack?" and got great detailed responses.
Augment, however, doesn't answer questions the way I do. So I took Augment's detailed technical replies and dropped them into another one of our companies, Bench. A while back I had Bench read a lot of my blog posts and create a prompt that writes articles the way I would. I've saved this prompt in Bench's agent library and can apply it anytime I want it to write like I would.
Once I had Augment's technical details of how Ask LukeW worked written the way I'd explain them by Bench, I took the results and added them as saved answers to the Ask LukeW corpus. Now anytime someone asks these kinds of questions, they get much more detailed technical answers. In fact, this worked so well that I also asked Augment to write up the overall tech stack for my Website and went through the same process.
I for one, found this a really enlightening look at where generative publishing is now. I can see what kinds of information I should be publishing by looking at the questions people ask my personal AI but don't get good answers for. I can use an AI for coding tool to turn code into prose. I can use an agentic workspace to rewrite that prose the way I would because I taught it to write like me. And finally I can feed that content back into my overall corpus so it's available for any similar questions people ask in the future.
That doesn't look like the publishing of old to me. Of course, it's split between multiple tools, requires me know what each one can do, and a host of other issues. We're still early but it's exciting.
2025-06-20 08:00:00
At this point, almost every software domain has launched or explored AI features. Despite the wide range of use cases, most of these implementations have been the same ("let's add a chat panel to our app"). So the problems are the same as well.
Open-ended interfaces to AI models have the same problem as every "invisible" interface that came before them. Without a clear set of affordances, people don't know what they can do. The vision of these invisible UIs was always something like "Voice interfaces will work when you can ask them anything". Today it's "AI chat interfaces will work because you can tell them to do anything". Sounds great but...
In reality, even extremely capable systems (like extremely capable people) have limitations. They do some things well, some things ok, and other things poorly. How you ask them to do things also matters as different phrasings yield different results. But without affordances, these guideposts are as invisible as the UI.
I'm pretty certain this is the biggest problem in AI product interfaces today: because large-scale AI models can do so many things (but not all things or all things equally well), most people don't know what they can do nor how to best instruct/prompt them.
If capability awareness is knowing what an AI product can do, context awareness is knowing how it did it. The fundamental question here is "what information did an AI product use to provide an answer?" But there's lots of potential answers especially as agents can make use of an increasing number and variety of tools. Some examples of what could be in context (considered in an AI model's response):
You get the idea. There's a lot of stuff that could be in context at any given point, but not everything will be in context all the time because models have context limits. So when getting replies people aren't sure if or how much they should trust them. Was the right information used or not (hallucinations)?
While writing has done an enormous amount to enable communication, it's not the only medium for conveying information and, often, it may not be the best. Despite this, most AI products render the streams of text emitting from AI models as their primary output and they render them in a linear "chat-like" interface. Unsurprisingly, people have a hard time extracting and recalling information by scrolling through long blocks of text.
As the novelty of AI models being able to write text wears off, people increasingly ask for visuals, tables, and other formats like slides, spreadsheets as output instead of just walls of text.
Yes, there's other issues with AI products. I'm not suggesting this is a complete list but it is reflective of what I'm currently seeing over and over in user testing and across multiple domains. But it's still early for AI products so... more solutions and issues to come.
2025-06-09 08:00:00
As an increasing number of AI applications evolve to agents doing work for people, agent management becomes a critical part of these product's design. How can people start, steer, and stop multiple agents (and subagents) and stay on top of their results? Here's several approaches we've been building and testing.
Whenever a new technology emerges, user interfaces go through a balancing act between making the new technology approachable through common patterns and embodying what makes it unique. Make things too different and risk not having an onramp that brings people on board smoothly. Make things too familiar and risk limiting the potential of new capabilities within old models and interactions.
"Copy, extend, and finally, discovery of a new form. It takes a while to shed old paradigms." - Scott Jenson
As an example, Apple's VisionOS interface notably made use of many desktop and mobile interaction patterns to smooth the transition to spatial computing. But at the same time, they didn't take full advantage of spatial computing's opportunities by boxing limitless 3D interactions within the windows, icons, and menus, and pointers (WIMP) familiar to desktop interfaces.
Hence, the balancing act.
This context helps frame the way we've approached designing agent management interfaces. Are there high level user interface patterns that are both familiar enough for people to intuit how they work and flexible enough to enable effective AI agent management at a high level? In an agent-centric AI application like Augment Code for software development or Bench for office productivity, people need to be able to:
To help people adapt to agent management, we explored how interface patterns like kanban boards, dashboards, inboxes, tasks lists and calendars could fulfill many of these requirements by presenting the state of multiple agents and allowing people to access specific agents when they need to take further action.
Kanban boards visualize work as cards moving through distinct stages, typically arranged in columns from left to right to represent progress through a workflow. They could be used to organize agents as they transition between scheduled, running, complete, and reviewed states. Or within workflows specific to domains like sales or engineering.
This pattern seems like a straightforward way to give people a sense of the state of multiple agents. But in kanban boards, people also expect to be able to move items between cards. How that would affect agents? Would they begin a new task defined by the card? Would that create a new agent or re-route an existing one?
Dashboards pull together multiple data sources into a unified monitoring interface through different visualizations like charts, graphs, and metrics. Unlike a kanban board, there's no workflow implied by the arrangement of the elements in a dashboard so you can pretty much represent agents anywhere and any way you like.
While that seems appealing, especially to those yearning for a "mission control" style interface to manage agents, it can quickly become problematic. When agents can be represented in different ways in different parts of a UI, it's hard to grasp both the big picture and details of what's happening.
The inbox pattern organizes items in a chronological stream that requires user action to process. Items are listed from newest to oldest with visual cues like unread counts so people can quickly assess and act on items without losing context. Most of us do so every day in our messaging and email apps so applying the same model to agents seems natural.
But if you get too much email or too many texts, your inbox can get away from you. So it's not an ideal pattern for applications with a high volume of agents to manage nor for those that require coordination of multiple, potentially inter-dependent agents.
For what it's worth, this where we iterated to (for now) in Bench. So if you'd like to try this pattern out, fire off a few agents there.
Task lists present items as discrete, actionable units with clear completion states (usually a checkbox). Their vertical stack format lets people focus on specific tasks while still seeing the bigger picture. Task lists can be highly structured or pretty ad hoc lists of random to-dos.
Indented lists of subtasks can also display parallel agent processes and show the inter-dependencies of agents but perhaps at the expense of simplicity. In a single linear list, like an Inbox, its much easier to see what's happening than in a hierarchical task list where some subtasks may be collapsed but relevant.
Calendar interfaces use a grid structure that maps to our understanding of time, with consistent rows and columns representing dates and times. This allows people to make use of both temporal memory and spatial memory to locate and contextualize items. Calendars also typically provide high level (month) and detailed (day) views of what's happening.
When it comes to scheduling agents, a calendar makes a lot of sense: just add it the same way you'd add a meeting. It's also helpful for contextually grouping the work of agents with actual meetings. "These tasks were all part of this project's brainstorm meeting." "I ran that task right after our one-on-one meeting." Representing the work of agents on a calendar can be tricky, though, as agents can run for minutes or many hours. And where should event-triggered agents should up on a calendar?
Coming back to Scott Jenson's quote at the start of this article, it takes a while to discover new paradigms and discover new forms. So it's quite likely as these interface patterns are adapted to agent management use cases, they'll evolve further and not end up looking much like their current selves. As David Hoang recently suggested, maybe agent management interfaces should learn from patterns found in Real-Time Strategy (RTS) games instead? Interesting...
2025-06-02 08:00:00
While chat interfaces to AI models aren't going away anytime soon, the increasing capabilities of AI agents are making the concept of chatting back and forth with an AI model to get things done feel archaic.
Let me first clarify that I don't mean open-ended text fields where people declare their intent are going away. As I wrote recently there will be even more broad input affordances in software whether for text, image, audio, video, or more. When I say chat AIs, I mean applications whose primary mode of getting things done is through a back and forth messaging conversation with an AI model: you type something, the model responds, you type something... and on it goes until you get the output you need.
Anyone that's interacted with an application like this knows that the AI model's responses quickly get lost in conversation threads and producing something from a set of chat replies can be painful. This kind of interface isn't optimal for tasks like authoring a document, writing code, or creating slides. To account for this some applications now include a canvas or artifact area where the output of the AI model's work can go.
In these layouts, the chat interface usually goes from being a single-pane layout to a split-pane layout. Roughly half the UI for input in the form of chat and half of it for output in the form of a canvas or artifact viewer. In these kinds of applications, we already begin to see the prominence of chat receding as people move between providing input and reviewing, editing, or acting on output.
In this model, however, the onus is still on the user to chat back and forth with a model until it produces their desired output in the artifact or canvas pane. Agents (AI models to make use of tools) change this dynamic. People state their objectives and the AI model(s) plans which tools to use and how to accomplish their task.
Instead of each step being a back and forth chat between a person and an AI model, the vast majority, if not all, of the steps are coordinated by the model(s) itself. This again reduces the role of chat. The model(s) takes care of the back and forth and in most cases simply lets people know when its done so they can review and make use of its output.
When agents can use multiple tools, call other agents and run in the background, a person's role moves to kicking things off, clarifying things when needed, and making use of the final output. There's a lot less chatting back and forth. As such, the prominence of the chat interface can recede even further. It's there if you want to check the steps an AI took to accomplish your task. But until then it's out of your way so you can focus on the output.
You can see this UI transition in the AI workspace, Bench. The first version was focused on back and forth instructions with models to get things done: a single-pane AI chat UI. Then a split-paned interface put more emphasis on the results of these instructions with half the screen devoted to an output pane. Today Bench runs and coordinates agents in the background. So the primary interaction is kicking off tasks and reviewing results when they're ready.
In this UI, the chat interface is not only reduced to less than a fourth of the screen but also collapsed by default hiding the model's back and forth conversations with itself unless people want to dig into it.
When working with AI models this way, the process of chatting back and forth to create things within in messaging UI feels dated. AI that takes your instructions, figures out how to get things done using tools, multiple models, changeable plans, and just tells you when it's finished feels a lot more like "the future". Of course I put future in quotes because at the rate AI moves these days the future will be here way sooner than any of us think. So... more UI changes to come!
2025-05-25 08:00:00
The last two weeks featured a flurry of new AI model announcements. Keeping up with these changes can be hard without some kind of personal benchmark. For me, that's been my personal AI feature, Ask LukeW, which allows me to both quickly try and put new models into production.
To start... what were all these announcements? On May 14th, OpenAI released three new models in their GPT-4.1 series. On May 20th at I/O, Google updated Gemini 2.5 Pro. On May 22nd, Anthropic launched Claude Opus 4 and Claude Sonnet 4. So clearly high-end model releases aren't slowing down anytime soon.
Many AI-powered applications develop and use their own benchmarks to evaluate new models when they become available. But there's still nothing quite like trying an AI model yourself in a domain or problem space you know very well to gauge its strengths and weaknesses.
To do this more easily, I added the ability to quickly test new models on the Ask LukeW feature of this site. Because Ask LukeW works with the thousands of articles I've written and hundreds of presentations I've given, it's a really effective way for me to see what's changed. Essentially, I know what good looks like because I know what the answers should be.
The Ask LukeW system retrieves as much relevant content as possible before asking a large language model (LLM) to generate an answer to someone's question (as seen in the system diagram). As a result, the LLM can have lots of content to make sense of when things get to the generation part of the pipeline.
Previously this resulted in a lot of "kitchen sink" style bullet point answers as frontier models mostly leaned toward including as much information as possible. These kinds of replies ended up using lots of words without clearly getting to the point. After some testing, I found Anthropic's Claude Opus 4 is much better at putting together responses that feel like they understood the essence of a question. You can see the difference in the before and after examples in this article. The responses to questions with lots of content to synthesize feel more coherent and concise.
It's worth noting I'm only using Opus 4 is for the generation part of the Ask LukeW pipeline which uses AI models to not only generate but also transform, clean, embed, retrieve, and rank content. So there's many other parts of the pipeline where testing new models matters but in the final generation step at the end, Opus 4 wins. For now...
2025-05-22 08:00:00
In his AI Speaker Series presentation at Sutter Hill Ventures, David Soria Parra of Anthropic, shared insights on the Model-Context-Protocol (MCP), an open protocol designed to standardize how AI applications interact with external data sources and tools. Here's my notes from his talk: