Chat-Based Interaction with LLMs is Suboptimal

Arguably, LLMs are bringing about a paradigm shift. Unlike the introduction of GUI or touchscreen, the popularization of LLMs entails changes on a different dimension--we are not making pixel-based graphical interactions easier, but are replacing complex graphical interfaces with textual interfaces.

In the realm of programming, this shift is not too conspicuous or inconvenient. After all, a significant portion of software development involves textual interfaces. Developers are also domain-language experts--after all, their top job is to be fluent and efficient in a language that resembles natural ones to varying degrees.

However, when designing general business-facing or consumer-facing products, this paradigm shift needs to be carefully considered. Most nontrivial work completed in an app does not take a purely textual input. The reason is bifold:

Natural language input lacks the precision required in certain tasks;
Chat-based input is less efficient when a task is recurring.

The Precision Issue

Human inputs to software are, strictly speaking, instructions with clear and unambiguous intents. How well a user has given the right input with regard to their intent may vary. Nevertheless, it is a good UI design practice to allow users effortlessly give the right input.

Now the first problem with (unstructured) natural language input is, it can be ambiguous in its intent. Remember when Excel aggressively formats phone numbers into scientific notations? Or gene names into dates? This is because Excel misunderstood the intent of the input, when no additional information is given--like manually setting a cell to the "Text" format.

The solution to this issue is simple: adding more information that clarifies an input's intent (like the aforementioned manual setting of the cell's format). Alternative approaches include requiring text inputs to follow a stringent syntax (e.g., programming languages), making use of keywords/symbols with special semantics (e.g., special tokens in raw LLM queries), adding contextual information (e.g., a text input field for First Name is semantically different from one for Last Name), or simply pushing information about the intent into the text input.

Unfortunately, the last approach is the most widely adopted one across AI-assisted applications. This leads to suboptimal user experiences. For instance, I've frequently encountered situations where Claude began crafting elaborate, detailed responses mixed with artifacts when I simply needed a concise overview. I often find myself interrupting the generation process and reformulating my question to explicitly request a high-level summary first. This pattern of having to clarify intent through increasingly verbose prompts creates friction in what should be a seamless interaction, highlighting the limitations of purely textual interfaces without structured guidance.

The Efficiency Issue

On the surface, the efficiency issue is simply about this: clicking a button is much easier than typing a long sentence.

But the problem runs deeper. When users perform recurring tasks, they develop muscle memory and mental shortcuts. A well-designed graphical interface with consistent buttons, dropdowns, and visual cues allows users to execute complex operations with minimal cognitive load. In contrast, formulating precise natural language instructions for each task requires continuous mental effort, even for routine operations. This cognitive overhead becomes particularly apparent in productivity applications where users perform the same actions hundreds of times daily. While LLMs offer unprecedented flexibility, they often sacrifice the efficiency that comes from purpose-built interfaces optimized for specific workflows.

This is why Cursor's Tab function is a genious design while Ctrl/Cmd K less so. The former is a highly convenient input designed for the most common task(s) in programming--continue editing the code based on existing code and recent edits; the latter is just fallback to a generic UI element. While versatile, this approach introduces friction exactly when efficiency matters most: during repeated, core workflows. The difference illustrates how specialized, context-aware shortcuts can dramatically outperform generic text-based interfaces for routine tasks.

What Are the Other Options?

An easy but effective approach is to have a lot of templates and corresponding forms for collecting data to fill in the templates. This is Jasper AI's approach when it first rolled out.

A nice but powerful addition to this approach is to automate context-aware selection of a template, possibly aided by an LLM. So instead of having the user manually pick from an ever-expanding suite of templates, an agent does so intelligently based on the problem at hand. Note that the agent can be a hybrid one, that is, its decision-making process is a mixture of rigid rules combined with the versatile information processing capability of LLMs.

Of course, this requires that our AI-assisted apps be more closely knit together by shared data. Like when I open Cursor in a project repo, it should understand the to-do items that I have been assigned to, and start to think about how to address them, after confirming with me. The whole process may require minimal text inputs from me because all the relevant information is collected else where or can be done with some buttons.

In fact, a lot of business use cases can follow this pattern, like sales agents making daily calls based on incoming leads, or customer support agents picking up Zendesk tickets. In the end, the most effective AI-assisted applications will be those that thoughtfully blend the strengths of both paradigms: the flexibility and natural language understanding of LLMs with the precision and efficiency of structured interfaces and organized data. While LLMs excel at processing diverse data types, their implementation shouldn't be limited to chat-based interactions with unstructured inputs.