MoreRSS

site iconLuke WroblewskiModify

Luke joined Google when it acquired Polar in 2014 where he was the CEO and Co-founder. Before founding Polar, Luke was the Chief Product Officer and Co-Founder of Bagcheck which was acquired by Twitte
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Luke Wroblewski

More on Generative Publishing

2025-06-22 08:00:00

One of the most common questions people ask my personal AI, Ask LukeW, is "how did you build this?" While I've written a lot about the high level architecture and product design details of the service, I never published a more technical overview. Doing so highlighted enough interesting generative publishing ideas that I decided to share a bit about the process.

First of all, Ask LukeW makes use of the thousands of articles I've written over the years to answer people's questions about digital product design. Yes, that's a lot of writing but it's not enough to capture all the things I've learned over the past 30 years. Which means sometimes people Ask LukeW questions that I can answer but haven't written about.

Ask LukeW question with no reply

In the admin system I built for Ask LukeW, I can not only see the questions that don't get answered well but I can also add content to answer them better in the future. Over the last two years, I've added about 500 answers and thereby expanded the corpus Ask LukeW can respond from by a lot. So the next time similar questions get asked, people aren't left without answers.

Ask LukeW add a saved question interface

That process is an interesting part of generative publishing that I've written about before but it's also how I know that people regularly ask how I built Ask LukeW. they want technical details: what frameworks, what models, what services. I never wrote this up because I'm not that technical and several great engineers helped me build Ask LukeW. As a result, I didn't think I'd do a great job detailing the technical aspect of things.

But one day it occurred to me I could use our AI for code company, Augment Code, which has a deep contextual understanding of codebases to help me write up how Ask LukeW works. I opened the codebase in VS Code and asked Augment the questions people asked me: "how does the feature work?" "what is the codebase?" "what is the tech stack?" and got great detailed responses.

Ask LukeW Augment Code response

Augment, however, doesn't answer questions the way I do. So I took Augment's detailed technical replies and dropped them into another one of our companies, Bench. A while back I had Bench read a lot of my blog posts and create a prompt that writes articles the way I would. I've saved this prompt in Bench's agent library and can apply it anytime I want it to write like I would.

Once I had Augment's technical details of how Ask LukeW worked written the way I'd explain them by Bench, I took the results and added them as saved answers to the Ask LukeW corpus. Now anytime someone asks these kinds of questions, they get much more detailed technical answers. In fact, this worked so well that I also asked Augment to write up the overall tech stack for my Website and went through the same process.

Ask LukeW tech stack question

I for one, found this a really enlightening look at where generative publishing is now. I can see what kinds of information I should be publishing by looking at the questions people ask my personal AI but don't get good answers for. I can use an AI for coding tool to turn code into prose. I can use an agentic workspace to rewrite that prose the way I would because I taught it to write like me. And finally I can feed that content back into my overall corpus so it's available for any similar questions people ask in the future.

Ask LukeW tech stack question

That doesn't look like the publishing of old to me. Of course, it's split between multiple tools, requires me know what each one can do, and a host of other issues. We're still early but it's exciting.

Common AI Product Issues

2025-06-20 08:00:00

At this point, almost every software domain has launched or explored AI features. Despite the wide range of use cases, most of these implementations have been the same ("let's add a chat panel to our app"). So the problems are the same as well.

Capability Awareness

Open-ended interfaces to AI models have the same problem as every "invisible" interface that came before them. Without a clear set of affordances, people don't know what they can do. The vision of these invisible UIs was always something like "Voice interfaces will work when you can ask them anything". Today it's "AI chat interfaces will work because you can tell them to do anything". Sounds great but...

Open-ended AI chat UI

In reality, even extremely capable systems (like extremely capable people) have limitations. They do some things well, some things ok, and other things poorly. How you ask them to do things also matters as different phrasings yield different results. But without affordances, these guideposts are as invisible as the UI.

I'm pretty certain this is the biggest problem in AI product interfaces today: because large-scale AI models can do so many things (but not all things or all things equally well), most people don't know what they can do nor how to best instruct/prompt them.

Context Awareness

If capability awareness is knowing what an AI product can do, context awareness is knowing how it did it. The fundamental question here is "what information did an AI product use to provide an answer?" But there's lots of potential answers especially as agents can make use of an increasing number and variety of tools. Some examples of what could be in context (considered in an AI model's response):

  • It's own training data? If so, when was the cut off?
  • The history of your session with the model? If so, going how far back?
  • The history of all your sessions or a user profile? If so, which parts?
  • Specific tools like search or browse? If so, which of their results?
  • Specific connections to other services or accounts? If so...

Context windows in AI chat interfaces

You get the idea. There's a lot of stuff that could be in context at any given point, but not everything will be in context all the time because models have context limits. So when getting replies people aren't sure if or how much they should trust them. Was the right information used or not (hallucinations)?

Walls of Text

While writing has done an enormous amount to enable communication, it's not the only medium for conveying information and, often, it may not be the best. Despite this, most AI products render the streams of text emitting from AI models as their primary output and they render them in a linear "chat-like" interface. Unsurprisingly, people have a hard time extracting and recalling information by scrolling through long blocks of text.

Walls of text in AI products

As the novelty of AI models being able to write text wears off, people increasingly ask for visuals, tables, and other formats like slides, spreadsheets as output instead of just walls of text.

And More..

Yes, there's other issues with AI products. I'm not suggesting this is a complete list but it is reflective of what I'm currently seeing over and over in user testing and across multiple domains. But it's still early for AI products so... more solutions and issues to come.

Agent Management Interface Patterns

2025-06-09 08:00:00

As an increasing number of AI applications evolve to agents doing work for people, agent management becomes a critical part of these product's design. How can people start, steer, and stop multiple agents (and subagents) and stay on top of their results? Here's several approaches we've been building and testing.

Whenever a new technology emerges, user interfaces go through a balancing act between making the new technology approachable through common patterns and embodying what makes it unique. Make things too different and risk not having an onramp that brings people on board smoothly. Make things too familiar and risk limiting the potential of new capabilities within old models and interactions.

"Copy, extend, and finally, discovery of a new form. It takes a while to shed old paradigms." - Scott Jenson

As an example, Apple's VisionOS interface notably made use of many desktop and mobile interaction patterns to smooth the transition to spatial computing. But at the same time, they didn't take full advantage of spatial computing's opportunities by boxing limitless 3D interactions within the windows, icons, and menus, and pointers (WIMP) familiar to desktop interfaces.

Hence, the balancing act.

Apple Vision OS user interface

This context helps frame the way we've approached designing agent management interfaces. Are there high level user interface patterns that are both familiar enough for people to intuit how they work and flexible enough to enable effective AI agent management at a high level? In an agent-centric AI application like Augment Code for software development or Bench for office productivity, people need to be able to:

  • Start new agents through a combination of instructions and context (files, connections, etc.)
  • Schedule agents to run at certain times or under certain conditions.
  • Scrutinize the work of agents to asses whether or not they're making the right kind of progress.
  • Steer agents when they go off course, require clarification, or uncover something that suggests they should take a different path.
  • Stop agents when they've either done enough or are no longer being effective.
  • See, share, and save the results or processes of agents.

To help people adapt to agent management, we explored how interface patterns like kanban boards, dashboards, inboxes, tasks lists and calendars could fulfill many of these requirements by presenting the state of multiple agents and allowing people to access specific agents when they need to take further action.

Kanban Board

Kanban boards visualize work as cards moving through distinct stages, typically arranged in columns from left to right to represent progress through a workflow. They could be used to organize agents as they transition between scheduled, running, complete, and reviewed states. Or within workflows specific to domains like sales or engineering.

Kanban board agent management interface

This pattern seems like a straightforward way to give people a sense of the state of multiple agents. But in kanban boards, people also expect to be able to move items between cards. How that would affect agents? Would they begin a new task defined by the card? Would that create a new agent or re-route an existing one?

Dashboard

Dashboards pull together multiple data sources into a unified monitoring interface through different visualizations like charts, graphs, and metrics. Unlike a kanban board, there's no workflow implied by the arrangement of the elements in a dashboard so you can pretty much represent agents anywhere and any way you like.

Dashboard agent management interface

While that seems appealing, especially to those yearning for a "mission control" style interface to manage agents, it can quickly become problematic. When agents can be represented in different ways in different parts of a UI, it's hard to grasp both the big picture and details of what's happening.

Inbox

The inbox pattern organizes items in a chronological stream that requires user action to process. Items are listed from newest to oldest with visual cues like unread counts so people can quickly assess and act on items without losing context. Most of us do so every day in our messaging and email apps so applying the same model to agents seems natural.

Inbox agent management interface

But if you get too much email or too many texts, your inbox can get away from you. So it's not an ideal pattern for applications with a high volume of agents to manage nor for those that require coordination of multiple, potentially inter-dependent agents.

For what it's worth, this where we iterated to (for now) in Bench. So if you'd like to try this pattern out, fire off a few agents there.

Task List

Task lists present items as discrete, actionable units with clear completion states (usually a checkbox). Their vertical stack format lets people focus on specific tasks while still seeing the bigger picture. Task lists can be highly structured or pretty ad hoc lists of random to-dos.

Tasklist agent management interface

Indented lists of subtasks can also display parallel agent processes and show the inter-dependencies of agents but perhaps at the expense of simplicity. In a single linear list, like an Inbox, its much easier to see what's happening than in a hierarchical task list where some subtasks may be collapsed but relevant.

Calendar

Calendar interfaces use a grid structure that maps to our understanding of time, with consistent rows and columns representing dates and times. This allows people to make use of both temporal memory and spatial memory to locate and contextualize items. Calendars also typically provide high level (month) and detailed (day) views of what's happening.

Calendar agent management interface

When it comes to scheduling agents, a calendar makes a lot of sense: just add it the same way you'd add a meeting. It's also helpful for contextually grouping the work of agents with actual meetings. "These tasks were all part of this project's brainstorm meeting." "I ran that task right after our one-on-one meeting." Representing the work of agents on a calendar can be tricky, though, as agents can run for minutes or many hours. And where should event-triggered agents should up on a calendar?

Real Time Strategy game user interfaces

Coming back to Scott Jenson's quote at the start of this article, it takes a while to discover new paradigms and discover new forms. So it's quite likely as these interface patterns are adapted to agent management use cases, they'll evolve further and not end up looking much like their current selves. As David Hoang recently suggested, maybe agent management interfaces should learn from patterns found in Real-Time Strategy (RTS) games instead? Interesting...

The Receding Role of AI Chat

2025-06-02 08:00:00

While chat interfaces to AI models aren't going away anytime soon, the increasing capabilities of AI agents are making the concept of chatting back and forth with an AI model to get things done feel archaic.

Let me first clarify that I don't mean open-ended text fields where people declare their intent are going away. As I wrote recently there will be even more broad input affordances in software whether for text, image, audio, video, or more. When I say chat AIs, I mean applications whose primary mode of getting things done is through a back and forth messaging conversation with an AI model: you type something, the model responds, you type something... and on it goes until you get the output you need.

Chat User Interface Back and Forth

Anyone that's interacted with an application like this knows that the AI model's responses quickly get lost in conversation threads and producing something from a set of chat replies can be painful. This kind of interface isn't optimal for tasks like authoring a document, writing code, or creating slides. To account for this some applications now include a canvas or artifact area where the output of the AI model's work can go.

In these layouts, the chat interface usually goes from being a single-pane layout to a split-pane layout. Roughly half the UI for input in the form of chat and half of it for output in the form of a canvas or artifact viewer. In these kinds of applications, we already begin to see the prominence of chat receding as people move between providing input and reviewing, editing, or acting on output.

Chat User Interface with Artifacts

In this model, however, the onus is still on the user to chat back and forth with a model until it produces their desired output in the artifact or canvas pane. Agents (AI models to make use of tools) change this dynamic. People state their objectives and the AI model(s) plans which tools to use and how to accomplish their task.

Chat User Interface Tool Calls

Instead of each step being a back and forth chat between a person and an AI model, the vast majority, if not all, of the steps are coordinated by the model(s) itself. This again reduces the role of chat. The model(s) takes care of the back and forth and in most cases simply lets people know when its done so they can review and make use of its output.

Chat User Interface Agents

When agents can use multiple tools, call other agents and run in the background, a person's role moves to kicking things off, clarifying things when needed, and making use of the final output. There's a lot less chatting back and forth. As such, the prominence of the chat interface can recede even further. It's there if you want to check the steps an AI took to accomplish your task. But until then it's out of your way so you can focus on the output.

The Receding Role of AI Chat

You can see this UI transition in the AI workspace, Bench. The first version was focused on back and forth instructions with models to get things done: a single-pane AI chat UI. Then a split-paned interface put more emphasis on the results of these instructions with half the screen devoted to an output pane. Today Bench runs and coordinates agents in the background. So the primary interaction is kicking off tasks and reviewing results when they're ready.

In this UI, the chat interface is not only reduced to less than a fourth of the screen but also collapsed by default hiding the model's back and forth conversations with itself unless people want to dig into it.

Bench UI evolution: single pane to two pane

Bench UI evolution: agent management

When working with AI models this way, the process of chatting back and forth to create things within in messaging UI feels dated. AI that takes your instructions, figures out how to get things done using tools, multiple models, changeable plans, and just tells you when it's finished feels a lot more like "the future". Of course I put future in quotes because at the rate AI moves these days the future will be here way sooner than any of us think. So... more UI changes to come!

Ask LukeW: Generation Model Testing

2025-05-25 08:00:00

The last two weeks featured a flurry of new AI model announcements. Keeping up with these changes can be hard without some kind of personal benchmark. For me, that's been my personal AI feature, Ask LukeW, which allows me to both quickly try and put new models into production.

To start... what were all these announcements? On May 14th, OpenAI released three new models in their GPT-4.1 series. On May 20th at I/O, Google updated Gemini 2.5 Pro. On May 22nd, Anthropic launched Claude Opus 4 and Claude Sonnet 4. So clearly high-end model releases aren't slowing down anytime soon.

Many AI-powered applications develop and use their own benchmarks to evaluate new models when they become available. But there's still nothing quite like trying an AI model yourself in a domain or problem space you know very well to gauge its strengths and weaknesses.

Ask LukeW Claude Opus 4 comparison question

To do this more easily, I added the ability to quickly test new models on the Ask LukeW feature of this site. Because Ask LukeW works with the thousands of articles I've written and hundreds of presentations I've given, it's a really effective way for me to see what's changed. Essentially, I know what good looks like because I know what the answers should be.

Ask LukeW system diagram

The Ask LukeW system retrieves as much relevant content as possible before asking a large language model (LLM) to generate an answer to someone's question (as seen in the system diagram). As a result, the LLM can have lots of content to make sense of when things get to the generation part of the pipeline.

Ask LukeW Claude Opus 4 comparison

Previously this resulted in a lot of "kitchen sink" style bullet point answers as frontier models mostly leaned toward including as much information as possible. These kinds of replies ended up using lots of words without clearly getting to the point. After some testing, I found Anthropic's Claude Opus 4 is much better at putting together responses that feel like they understood the essence of a question. You can see the difference in the before and after examples in this article. The responses to questions with lots of content to synthesize feel more coherent and concise.

It's worth noting I'm only using Opus 4 is for the generation part of the Ask LukeW pipeline which uses AI models to not only generate but also transform, clean, embed, retrieve, and rank content. So there's many other parts of the pipeline where testing new models matters but in the final generation step at the end, Opus 4 wins. For now...

MCP: Model-Context-Protocol

2025-05-22 08:00:00

In his AI Speaker Series presentation at Sutter Hill Ventures, David Soria Parra of Anthropic, shared insights on the Model-Context-Protocol (MCP), an open protocol designed to standardize how AI applications interact with external data sources and tools. Here's my notes from his talk:

  • Models are only as good as the context provided to them, making it crucial to ensure they have access to relevant information for specific tasks
  • MCP standardizes how AI applications interact with external systems, similar to how the Language Server Protocol (LSP) standardized development tools
  • MCP is not a protocol between models and external systems, but between AI applications that use LLMs and external systems
  • Without MCP, AI development is fragmented with every application building custom implementations, custom prompts, and custom tool calls
  • MCP separates the concerns of providing data access from building applications
  • This separation allows application developers to focus on building better applications while data providers can focus on exposing their data effectively

David Soria Parra Speaker Series poster

How MCP Works

  • Two major components exist in an MCP system: client (implemented by the application using the LLM) and server (serves context to the client)
  • MCP servers offer: Tools (functions that perform actions), Resources (raw data content exposed by the server), Prompts (show how tools should be invoked)
  • Application developers can connect their apps to any MCP server in the ecosystem
  • API developers can expose their data to multiple AI applications by implementing an MCP server once
  • Allows different organizations within large companies to build components independently that work together through the protocol

Writing Good Tools for MCP

  • Tools should be simple and focused on specific tasks
  • Comprehensive descriptions help models understand when and how to use the tools
  • Error messages should be in natural language to facilitate better interactions
  • The goal is to create tools that are intuitive for both models and users

Future Directions for MCP

  • Remote MCP servers with proper authorization mechanisms
  • An official MCP registry to discover available servers and tools
  • Asynchronous execution for long-running tasks
  • Streaming data capabilities from servers to clients
  • Namespacing to organize tools and resources
  • Improved elicitation techniques for better interactions
  • There's a need for a structure to manage the protocol as it grows