AI InterfacesAutessa LensAutessa Forge

Generative UI Isn't Always the Answer: The Five Ways an Interface Gets Built

Generated, conversational UI is treated as the future of every interface. Four decades of HCI research say it is one of five options, each right for different jobs. Here is the map and the evidence.

By Roshnee Sharma · CEO, Autessa

10 min read·View raw markdown (.md)

Generated, conversational UI is being treated as the end state for every interface. The research record going back forty years suggests it is one point on a longer spectrum, and the right point depends on the job. Here is the full map, with the evidence, and why an enterprise platform should support all of it rather than prescribe one.

In November 2025, Google began generating graphical interfaces directly inside Search. Its AI Mode now builds bespoke interactive tools, such as a custom loan calculator, in response to a single query, and the Gemini app does the same through an experiment called dynamic view (Google Research, 2025). A month later, Google published A2UI, an open protocol for agent-generated interfaces (Google Developers, 2025). Two years earlier, Vercel had launched v0, a tool that turns a text prompt into a working React component, and then open-sourced its generative UI approach in the AI SDK (Vercel, 2023; Vercel, 2024). The direction of travel is clear. Interfaces are increasingly composed in the moment rather than designed in advance.

This post takes the strongest version of that trend seriously and tests it. If the technology were perfect and latency was a nonissue, what would the best graphical interface actually be? The answer is not a single design. It is a spectrum of five distinct approaches, each correct for some jobs and wrong for others. Four decades of human-computer interaction research, and the newest results from the labs now building generative UI, point to the same conclusion. The question is not which approach wins, but which one fits the work in front of you.

The five ways a graphical interface gets built, laid out from predictable and human-owned on the left to dynamic and machine-owned on the right. Static UI is built once for everyone; a per-company interface is fitted by a consultant; a per-user interface is shaped by the user and then locked in; a per-page-load interface is generated by the machine on every sign-in; and a per-interaction interface is generated fresh for each request. The dashed line marks the accountability boundary, where ownership passes from the user to the machine.

Why is it fair to ignore latency?

It is fair to ignore latency because latency is the part of this problem that engineering reliably solves over time, while the question of which interface fits a job is the part that does not solve itself. Treating speed as a temporary constraint rather than a permanent limit is the more useful way to reason about where these approaches are heading.

The history of the web makes the point. When consumer internet arrived, loading a single page over a dial-up modem could take the better part of a minute, and rich media was effectively impossible. Nobody concluded that web pages were a bad idea. They concluded that the page was the right unit and the speed was the problem to solve, and then two decades of broadband, caching, content delivery networks, and faster hardware solved it. The same pattern has repeated with streaming video, with 3D graphics, and with every interface that was once too slow to be practical.

Generated interfaces are at the dial-up stage now. They are slow because the model has to compose the interface on demand, and that cost is real today. But it is the kind of cost that gets engineered down once the architecture is settled. It can be reduced by caching generated components, by reusing layouts across similar requests, by generating ahead of the user rather than in front of them, and by faster inference. If we decide which approach is appropriate for a given job first, and pick the right components and the right execution model, then speed becomes a focused engineering target rather than a reason to avoid the approach entirely. Holding latency constant lets us reason about the durable question, which is fit, instead of the temporary one, which is speed.

Why do graphical interfaces still matter?

Graphical interfaces still matter because they let a person take in several things at once, compare them, and reach a conclusion by seeing data laid out in space rather than holding it all in working memory. They remain the gold standard for how people work with technology, and that is not an accident of habit.

A table reveals an outlier. A chart reveals a trend. A layout reveals a relationship between two numbers that prose would force the reader to reconstruct one sentence at a time. A well-designed interface does part of the thinking for the user simply by arranging information well. This is the core reason the field of human-computer interaction has treated visual structure as central since its inception (Mitchell and Shneiderman, 1989).

Conversational interfaces are powerful precisely because they remove this structure and let the user ask for anything. But removing structure has a cost, and the cost is exactly the thing a graphical interface is good at, which is showing many things at once, consistently, in a form the user can learn and return to. The question is not whether visual interfaces survive. It is who decides what they show, and how often that decision gets remade.

What is the real axis behind UI personalization?

The real axis behind UI personalization is not how personalized an interface is, but who owns the tailoring and how often it happens. Most discussions sort approaches by degree of personalization, as if that were a single dial running from generic to bespoke. That framing hides the decision that actually matters.

The useful axis has two questions. The first is who owns the tailoring, whether it is the people using the software, a hired specialist, the individual user, or the machine. The second is how often the tailoring happens, whether it is once at build time, once per engagement, once per session, or freshly on every request. The HCI literature has long drawn the central distinction here as adaptable versus adaptive, where an adaptable interface is changed by the user and an adaptive interface is changed by the system (Findlater and McGrenere, 2004). Sorting the five approaches along this axis makes their tradeoffs legible, because as ownership shifts toward the machine and the tailoring happens more often, the interface holds still less and less. Stability and responsiveness move in opposite directions, and no single point on the line is best for everything.

The axis that actually matters is not how personalized an interface is, but who owns the tailoring and how often it happens. The horizontal axis runs from user-owned on the left to machine-owned on the right; the vertical axis runs from a one-time decision at build time up to a fresh decision on every request. The user-owned approaches sit on the adaptable left half, where the result holds still; the machine-owned approaches sit on the adaptive right half, where the view can change without the user directing it.

What is a static UI?

A static UI is one interface designed at build time and shipped to everyone, with no tailoring to company, user, or context. This is the legacy web and classic SaaS, and it is still how most software works.

The defining property is that the product never bends, so the organization bends instead. Adopting the software means training programs, hiring people who already know it, and reshaping internal workflows to match how the product expects work to be done. In exchange, the interface is completely predictable. Every user sees the same screen, the layout never moves, and institutional knowledge about how to use it accumulates and transfers between people. Static UI fits no one in particular and everyone equally. It is the most learnable and the least personal, and for a large class of stable, high-volume, regulated workflows that tradeoff is exactly right.

What does a per-company interface look like?

A per-company interface keeps the interface fixed but fits it to a specific organization, usually through consultants or an internal implementation team. They study how the company actually works and tailor the software through configuration, custom modules, and bespoke workflow.

The result is more fitted than static UI and just as stable once it is in place. The cost is in the fitting. It is slow, it requires specialized labor, and the fit decays as the business changes, which means re-engaging the specialists whenever the process shifts. But within a stable process, this approach produces a highly fitted system that then holds steady, which is why it remains the default for large enterprise deployments where the cost of getting it wrong is high.

How does a per-user interface stay personal and stable?

A per-user interface stays personal and stable because the user, not the machine, controls the tailoring, and the result holds still once they set it. Everyone reads from one source of truth underneath, and each person shapes their own view on top of it.

This is the approach most often misunderstood, so the architecture is worth stating precisely. The data lives in one place, governed and consistent. The interface is a per-user surface that reads from that shared data. Personalizing the interface therefore never forks the underlying records, because the personalization lives entirely in the presentation layer. Two users can arrange, filter, and prioritize completely differently and still be looking at the same authoritative numbers.

The other defining property is that the user is the one deciding. The interface is not guessing and reshaping itself. The person directs it, and once it looks the way they want, it locks in and stays that way across sessions. In HCI terms this is an adaptable interface, and the evidence favors it on a dimension that matters. In a controlled study comparing static, adaptable, and adaptive menus, users preferred the adaptable version they controlled themselves, and it performed comparably to a fixed layout while the system-driven adaptive version was measurably slower (Findlater and McGrenere, 2004). Per-user produces something unusual on the spectrum, an interface that is personal and stable at the same time. The user keeps their learned layout and muscle memory, while the organization keeps a single trustworthy data layer.

What is a per-page-load interface?

A per-page-load interface is one the machine composes for each user at sign-in, from their data, role, history, or context, without the user directing it. The user does not choose the arrangement. The machine does.

The defining property is that the view can change between visits. That fluidity is the entire point and the entire risk, depending on the job. Because the machine chooses what to surface, it can put exactly the right thing in front of a given user in a given context, which is valuable when the goal is to guide attention. The same mechanism means the machine can also leave out something that was present last time, without the user knowing it was omitted. This is the property that decades of adaptive-interface research has repeatedly found to be double-edged. System-controlled adaptation can help users who never customize anything, but it costs predictability, and predictability is part of what makes an interface fast to use (Findlater and McGrenere, 2004; Gajos et al., 2008). Where the work depends on a complete and consistent view, that silent variability is a liability. Where the goal is to surface the most relevant subset, the same property is a strength.

What is a per-interaction interface?

A per-interaction interface has no standing screen at all. The user asks for something in language, and the system generates a view to answer that one request. This is the conversational, generated interface that the current wave of products is built around, from Vercel's v0 to Google's dynamic view and A2UI (Vercel, 2023; Google Research, 2025; Google Developers, 2025).

Its strength is on-demand synthesis. When the task is to gather, combine, and retrieve information that does not live together in any existing screen, generating a bespoke view for the question is extraordinarily powerful, because no one had to anticipate the question in advance. Google's own framing of A2UI uses exactly this case. Rather than an agent reading restaurant times aloud in a clunky back-and-forth, it composes a small interface with a date picker and a time selector for the one task at hand (Google Developers, 2025). The cost is the mirror image of that strength. Nothing is repeatable. Each view is generated fresh, so there is no stable layout to learn, no muscle memory to build, and no guarantee that asking the same thing twice produces the same arrangement. It is ideal for exploration and synthesis and poorly suited to the repeated, structured work that benefits from a consistent surface.

What do people get wrong about personalized UI?

The thing people most often get wrong about personalized UI is treating per-user and per-page-load as the same idea, because both produce a screen tailored to the individual. They are opposites in the property that matters.

Per-user is personal and fixed. The user decides, and the result holds still. Per-page-load is machine-chosen and fluid. The machine decides, and the result can change. This is precisely the adaptable-versus-adaptive distinction that HCI has studied for decades, and the two were found to behave differently in controlled testing, not just in theory (Findlater and McGrenere, 2004). One optimizes for the user's control and continuity. The other optimizes for the system's judgment and relevance. They feel similar from a screenshot and behave nothing alike in production, particularly around accountability, because a per-user surface can always be reproduced, while a machine-generated one may not be. Any serious decision about which approach to use has to separate these two, because choosing one when you needed the other is the kind of mistake that does not surface until the work is already depending on it.

What does the research say about whether generated UI is better?

The research says generated UI is genuinely preferred by users, but only once you set aside the cost it actually imposes in practice. This is the most important nuance in the entire debate, and it comes straight from the labs building the technology.

In its 2025 paper accompanying the Search and Gemini rollouts, Google reported that human raters strongly preferred its generated interfaces over standard model outputs, but stated the result held "when ignoring generation speed" (Google Research, 2025). That qualifier is the whole game. Generation latency is the tax that makes a freshly built interface worse than a standing one for repeated work, and the preference flips once you count it. The thought experiment this post runs, in which latency is a nonissue, is therefore the most generous possible case for generated UI, and even under that assumption the older findings do not disappear.

The deeper academic record is more cautious still. Automatically generated interfaces have demonstrably helped in the right setting. Gajos and Weld's SUPPLE system generated interfaces adapted to a user's motor and vision abilities and closed more than sixty percent of the performance gap between users with and without motor impairments (Gajos and Weld, 2004; Gajos et al., 2007; ACM, 2023). But that success was grounded in a precise model of the individual user's capabilities, an idea later formalized as ability-based design (Wobbrock et al., 2011). General-purpose adaptation without that grounding has a far more mixed record, which is why surveys spanning fifty-five years of the field describe adaptive interfaces as a persistent usability tradeoff rather than a settled win (Brdnik et al., 2022). The newest work continues to treat generated interfaces as complementary to these older grounded approaches, not a replacement for them (CHI 2026 generative UI workshop).

The takeaway from the literature is not that generated UI is bad. It is that generated UI is one tool whose value depends entirely on context, on whether the task rewards fresh relevance or rewards stability, and on whether the system has a real model of what this user needs or is guessing.

When does a machine-generated interface go wrong?

A machine-generated interface goes wrong in three documented ways, when its guesses are wrong, when its variability disorients the user, and when it optimizes for the provider rather than the person. These are not hypothetical risks. Each one has support in the research.

The first failure mode is inaccuracy. When the machine chooses what to show, the value of that choice depends entirely on the prediction being correct, and Gajos and colleagues showed that predictive accuracy has a significant effect on whether an adaptive interface helps or hurts (Gajos et al., 2006; Gajos et al., 2008). A machine-curated view that guesses wrong does not just fail to help. It actively hides the thing the user needed and presents something less useful in its place, and the user has no way to know what was omitted. In an operational setting, where someone is accountable for acting on complete information, a confidently wrong layout is worse than a generic one.

The second failure mode is unpredictability. The central tension in the field, as Gajos and colleagues framed it, is that proponents tout performance gains while critics argue that adaptation's unpredictability can disorient users and cause more harm than good (Gajos et al., 2008). The evidence supports the critics in a specific case. Fast-paced adaptation, where the interface changes frequently and substantially, produced negative results, while slow-paced adaptation that changed gently helped (Findlater and McGrenere, 2004; Sears and Shneiderman, 1994). A fully generated interface that rebuilds on every request is the fastest-paced adaptation possible, which is precisely the condition the research associates with disorientation and lost efficiency. Frequent change also reduces users' awareness of features over time, which undermines the long-term learning that a stable interface supports.

The third failure mode is the most concerning for enterprises, and it is a question of incentives rather than accuracy. When a machine decides what each user sees, it can be optimized to serve the operator's interest rather than the user's, and at machine speed and scale this becomes a new category of manipulation. The interface design literature calls these dark patterns, and the problem is now large and measured. A European Commission sweep of several hundred widely used digital products found that the overwhelming majority contained at least one dark pattern (European Commission, 2022). Recent work shows that AI-driven interfaces make this worse rather than better, because models trained on existing interfaces learn the deceptive patterns already present in that data and can then replicate and personalize them in subtler ways than a human designer would hand-craft (Pandey, 2026). A machine that generates the interface and also benefits from the user's choices is structurally positioned to nudge those choices, and unlike a fixed interface that can be audited once, a freshly generated one can present a different, individually optimized version of that nudge to every user.

These three failure modes share a root cause. The user is not in control of the interface, and the result is not fixed enough to inspect, learn, or hold to account. They are most dangerous exactly where the work is operational, repeated, and consequential, and least dangerous where the interface is exploratory and low-stakes. But the deepest problem is not any single failure. It is what happens after a failure occurs, which depends entirely on who is able to fix it. We return to that question below, because it is the one that most cleanly separates the five approaches.

What about accessibility when a machine generates the interface?

Accessibility is where machine generation shows both its greatest promise and its most concrete present-day failure, and the difference between the two comes down to whether the machine is working from a real model of the user or simply reproducing what it learned from the web.

The promise is real and demonstrated. The strongest historical evidence for automatic UI generation is an accessibility result. SUPPLE generated interfaces tailored to each user's specific motor and vision abilities and closed more than sixty percent of the performance gap between users with and without motor impairments, with no declarative knowledge of any particular condition, working purely from observed ability (Gajos et al., 2007; ACM, 2023). A machine that genuinely models an individual's capabilities can produce an interface better suited to that person than any single hand-crafted default, because one static design cannot fit the full range of human ability. This is the version of "the machine adapts with no human" that accessibility advocates have wanted for two decades.

The present-day reality of LLM-generated interfaces is the opposite, and for a structural reason. Today's generators are trained on the public web, and the public web is overwhelmingly inaccessible. Audits find detectable WCAG violations on more than ninety percent of public pages (Mowar et al., 2025). A model trained on that corpus learns those failures and reproduces them, so LLMs frequently generate inaccessible interfaces by default, omitting alternative text, semantic landmarks, and properly labeled form controls (A11yn, 2025; CodeA11y, 2025). The gap is fixable but not automatic. Explicit accessibility instructions in the prompt measurably improve compliance, and models can self-identify and remediate simpler issues such as contrast and semantic structure at high success rates, but they struggle with harder requirements like correct ARIA implementation (Web for All, 2025). Crucially, accessibility cannot be fully verified automatically. Tools like axe and WAVE catch only a subset of issues, and testing with disabled users remains the gold standard, because the most fundamental limitation is that the model does not experience disability and cannot judge whether an interface is meaningfully usable rather than merely technically compliant (arXiv, 2026).

Color contrast deserves singling out, because it is both the most common failure on the web and the one a generated interface is most likely to break. Insufficient text contrast (WCAG 1.4.3) has been the single most prevalent accessibility failure on the web for seven consecutive years, present on roughly ninety-six percent of homepages in the WebAIM Million audit (WebAIM, 2024). LLM-generated code inherits this directly. In one empirical study, contrast issues made up about eighty percent of all accessibility violations in LLM-generated interfaces, and ChatGPT specifically was found to mishandle contrast in roughly a quarter of its violations (Aljedaani et al., 2024; Suh et al., 2025). The reason is mechanical, and it matches what practitioners observe. When a model is styling an interface, it picks foreground and background colors to satisfy a visual or branding intent, such as a light brand color on a white panel, without computing the luminance ratio those two colors actually produce. The model is choosing colors blind to the rendered result. Tellingly, the fix that works best in the research is to give the model the rendered screenshot so it can reason about the surrounding components and judge contrast visually, rather than emitting color values it never sees displayed (From Code to Compliance, 2025). A recent study of generated game interfaces found that text contrast ratio was the single strongest structural predictor of overall UI quality, which means getting contrast wrong does not just fail an audit, it makes the whole interface read as lower quality (GameUIAgent, 2026). Contrast is the clearest case of the general accessibility problem. A default generator optimizing for visual style will quietly produce something that looks fine to it and is unreadable for a low-vision user, and it will do so on every regeneration unless the system constrains it.

The distinction that resolves this is the same one running through the whole spectrum. Machine generation grounded in a real model of the user, the SUPPLE case, is a genuine accessibility advance. Machine generation that guesses from generic training data, the default case for an unguided generator, tends to inherit the web's exclusions and re-emit them at scale, and it does so freshly on every page load, which means there is no single rendered interface to audit, certify, and stand behind. For an enterprise with legal accessibility obligations, that last point matters enormously. A fixed or per-user interface can be tested once and certified, while a per-interaction interface that regenerates each time would, in principle, need its accessibility re-verified on every generation. The responsible pattern is to constrain generation to vetted, accessible components and to keep a human-reviewable, auditable surface, rather than to let the model emit arbitrary markup unsupervised.

Who fixes it when a generated interface is broken?

The question that separates the five approaches is not whether a flaw like a contrast failure can occur, but who can fix it and whether the fix survives. This is where the failure modes above stop being abstract, because a broken interface is only as bad as the loop available to repair it, and that loop looks completely different at each point on the spectrum.

What separates the five approaches is not whether a flaw can occur, but who can fix it and whether the fix survives. When a human owns the interface, an accountable person fixes the flaw once and it stays fixed, so iteration converges. A per-page-load interface has no repair loop at all, because the machine generates the view and there is no shared artifact to correct, so the flaw ships silently. A per-interaction chat interface does have a loop, but it makes the user fix the same flaw by hand on every request, so it never converges.

In the three human-owned approaches, fixing a flaw is a one-time act by someone accountable, and it stays fixed. If a static UI ships text at 3:1 contrast, a designer corrects the stylesheet once and every user benefits forever. If a per-company interface has the problem, the implementation team fixes it in configuration and it holds. If a per-user surface is hard to read, the user adjusts their own view, and because per-user interfaces lock in, the adjustment persists across sessions. In every case there is a durable artifact to correct, a person with the authority and the incentive to correct it, and a result that does not silently revert. Iteration converges. The interface gets better and stays better.

The two machine-owned approaches break this loop in opposite but equally damaging ways.

A per-page-load interface has no iteration loop at all. The machine composes the view at sign-in, the user receives whatever it produced, and a low-contrast or otherwise broken layout simply is the experience. The user did not design it and cannot change it. Worse, because the interface was generated for that user in that moment, there may be no shared, inspectable artifact for anyone to file a bug against, and the next user may receive a different broken layout, or a working one, with no way to tell. The flaw the research documents, an interface that looks fine to the model and is unreadable to a low-vision person, ships silently and is never caught, because nothing in the architecture provides a moment to catch it. This is the fear in its purest form, a system that produces a fresh, unreviewed interface for every visit, optimizing for what the model thinks looks right, with no human in the loop to notice when it is wrong.

A per-interaction chat interface does have an iteration loop, but it puts it in the wrong place and makes it the user's job, repeatedly. Suppose you ask a question and the system generates a view to answer it, and the text is unreadable. Now, to get the answer you already asked for, you have to stop being someone with a question and become an unpaid quality tester, asking it to make the contrast darker, telling it the text is too light to read, and trying again. The research shows this can work, because models can fix contrast when told to, but it also shows why it is the wrong loop. The fix does not persist. Nothing is repeatable in a per-interaction interface by definition, so the next time you ask a similar question, the model may generate the same flaw again, and you are back to correcting it. You wanted an answer. Instead you got a design session that resets on every interaction. For a one-off exploratory question that is a tolerable cost. For anything you do repeatedly, it is a tax that never stops being charged.

This is the throughline connecting every failure mode in this analysis. Inaccuracy, unpredictability, dark patterns, and accessibility regressions are all survivable when the interface is owned by a human who can fix it once and make the fix stick. They become structural when the interface is owned by the machine, because then either no one can iterate, as on page load, or the user must iterate endlessly and from scratch, as in chat. The danger was never that a machine-generated interface might have a flaw. Every interface has flaws. The danger is that, in exactly the cases where the machine owns the interface, the flaw has no durable owner and no convergent path to repair.

Why is the right answer usually a hybrid?

The right answer is usually a hybrid because the five approaches are not a ranking but a set of tools, and a real application typically needs more than one of them at the same time. Laid out together, they form a palette rather than a leaderboard.

A finance platform might keep its core ledger views as fitted, stable, per-company interfaces, because the numbers must be complete, consistent, and accountable. It might let individual analysts arrange their own dashboards as per-user surfaces over that same governed data. It might use per-page-load logic to highlight the three items most likely to need attention this morning. And it might offer a per-interaction conversational view for the open-ended question that no standing screen was built to answer. Each component sits at a different point on the axis, and each is there because that point is correct for that component's job. This is also where the industry is heading at the protocol level, because Google positions A2UI not as a single winning paradigm but as one format among several, explicitly complementary to other UI frameworks (Google Developers, 2025).

A realistic application is a hybrid. One finance platform can keep its core ledger as a fitted, stable per-company interface, let analysts arrange their own per-user dashboards, use per-page-load logic to highlight what needs attention this morning, and offer a per-interaction conversational view for the open-ended question no standing screen was built to answer. Each component sits at the point on the spectrum that fits its job, and all four read from one governed source of truth.

This is the realistic future. It is not a single interface paradigm that wins everywhere, but hybrid systems whose components are drawn from different points along the spectrum, chosen deliberately according to how much the interface needs to hold still and who should decide what it shows.

How does Autessa support this without prescribing it?

Autessa supports the full spectrum because the choice between these approaches is genuinely situational, which means the worst thing a platform can do is decide it for you. A platform that only generates conversational interfaces has implicitly chosen the most fluid, least repeatable point on the axis for every problem, including the many problems the research suggests need the opposite.

Two parts of the Autessa platform matter here. Lens is the perception layer that lets AI understand and work with real applications and interfaces, not just APIs, which means generated and machine-driven views can sit on top of the systems an enterprise already runs. Forge is the build layer for creating and deploying AI-powered systems and applications on your terms, which means the full range of interfaces, from fixed and fitted, to per-user, to fully generated, can be built and shipped on the same infrastructure. At its core, Forge brings per-user configuration even to out-of-the-box applications, so an individual can shape their own view of an existing app without anyone forking the underlying system. This is the per-user pattern the research favors, made available on software the enterprise already owns rather than only on something built from scratch.

Underneath both, the platform keeps one governed source of truth, with access control, auditability, and a complete record of every action. That is what makes the per-user pattern safe, because personalization stays in the presentation layer and never forks the data. It is also what makes the machine-driven patterns accountable, because even a freshly generated view is produced against governed data and logged like any other action, which directly addresses the reproducibility gap that worries researchers about system-generated interfaces. The same governed, component-based approach is what keeps generated interfaces accessible and auditable, because generation can be constrained to vetted components rather than arbitrary markup, and the resulting surface can be reviewed and certified rather than re-verified on every page load. The platform supplies the data integrity, the perception, and the build tooling. It does not supply an opinion about which interface your particular workflow should use, because that opinion belongs to you and to the evidence about your specific job.

What is the strategic takeaway?

Generated conversational interfaces are real, valuable, and clearly part of where software is heading, and they are one point on a spectrum that runs from fully fixed to fully fluid. The approaches differ in who owns the tailoring and how often it happens, and that difference determines how much the interface holds still and whether a given view can be reproduced and audited later. Four decades of research, up to and including the latest results from Google's own generative UI work, point the same way. The value of any approach is conditional on the job, and the preference for generated interfaces is strongest exactly when you ignore the costs it imposes in real use.

Betting an entire product on the most fluid end of that spectrum is a bet that no workflow needs stability, repeatability, or user control, which is not true of most real enterprise work. The durable strategy is to treat the spectrum as a palette, build each part of the system at the point that fits its job, and run all of it on infrastructure that keeps the data governed and the choice in human hands. The interface paradigm will keep changing. The need to choose deliberately, per workflow, will not.

References

Mitchell, J. and Shneiderman, B. (1989). Dynamic versus static menus: an exploratory comparison. ACM SIGCHI Bulletin. Foundational work establishing adaptive interfaces as a central HCI concern.
Findlater, L. and McGrenere, J. (2004). A comparison of static, adaptive, and adaptable menus. Proceedings of CHI 2004. ACM. Controlled study in which static and user-adaptable menus outperformed the system-adaptive menu, and users preferred the adaptable one.
Gajos, K. and Weld, D. S. (2004). SUPPLE: automatically generating user interfaces. Proceedings of IUI 2004. ACM.
Gajos, K. Z., Wobbrock, J. O., and Weld, D. S. (2007). Automatically generating user interfaces adapted to users' motor and vision capabilities. Proceedings of UIST 2007. ACM.
Sears, A. and Shneiderman, B. (1994). Split menus: effectively using selection frequency to organize menus. ACM Transactions on Computer-Human Interaction. Slow-paced, frequency-based adaptation that improved on a non-adaptive baseline.
Gajos, K. Z., Czerwinski, M., Tan, D. S., and Weld, D. S. (2006). Exploring the design space for adaptive graphical user interfaces. Proceedings of AVI 2006. ACM. Found predictive accuracy to have a significant effect on user performance.
Gajos, K. Z., Everitt, K., Tan, D. S., Czerwinski, M., and Weld, D. S. (2008). Predictability and accuracy in adaptive user interfaces. Proceedings of CHI 2008. ACM. Frames the proponent-versus-critic tension and shows both predictability and accuracy drive usability.
Wobbrock, J. O., Kane, S. K., Gajos, K. Z., Harada, S., and Froehlich, J. (2011). Ability-based design: concept, principles and examples. ACM Transactions on Accessible Computing.
Brdnik, S., et al. (2022). Adaptive user interfaces and universal usability through plasticity of user interface design. Computer Science Review. Survey spanning 55 years of research.
European Commission (2022). Behavioural study on unfair commercial practices in the digital environment: dark patterns and manipulative personalisation. Sweep finding that the large majority of widely used digital products contained at least one dark pattern.
Pandey, D. (2026). Emergent dark patterns in AI-generated user interfaces. arXiv preprint. Argues that models trained on existing interfaces learn and can personalize the deceptive patterns present in that data.
Suh, A., et al. (2025). When LLM-generated code perpetuates user interface accessibility barriers, how can we break the cycle? Proceedings of the Web for All Conference (W4A 2025). ACM. Finds explicit accessibility prompts improve WCAG compliance but ARIA and meaningful usability remain hard, and reports contrast among the dominant violation categories.
Aljedaani, W., et al. (2024). Does ChatGPT generate accessible code? Investigating accessibility challenges in LLM-generated source code. Proceedings of the Web for All Conference (W4A 2024). ACM. Found the majority of ChatGPT-generated sites fail WCAG, with contrast among the most common issues, mirroring human-written code.
WebAIM (2024). The WebAIM Million: an annual accessibility analysis of the top 1,000,000 home pages. Low-contrast text the most common detectable failure for seven consecutive years, on roughly 96% of homepages in 2023.
From Code to Compliance (2025). Assessing ChatGPT's utility in designing an accessible webpage: a case study. arXiv preprint. Found that supplying rendered screenshots lets the model reason about surrounding components and judge contrast, improving fixes.
GameUIAgent (2026). An LLM-powered framework for automated game UI design. arXiv preprint. Found text contrast ratio to be the single strongest structural predictor of generated-UI quality.
Mowar, P., et al. (2025). Reported in A11yn (2025): audits find detectable WCAG violations on more than 90% of public web pages.
A11yn (2025). Aligning LLMs for accessible web UI code generation. arXiv preprint. Shows LLMs replicate the web's accessibility failures by default and that accessibility can be optimized as a training objective.
CodeA11y (2025). Making AI coding assistants useful for accessible web development. arXiv preprint. Documents that LLMs frequently produce inaccessible code by default.
The Accessibility Capability Boundary (2026). arXiv preprint. Notes that AI-generated accessibility cannot be fully auto-verified and that user testing with disabled people remains the gold standard.
Vercel (2023). v0: generate UI from text prompts. Launched October 2023.
Vercel (2024). Introducing AI SDK 3.0 with Generative UI support.
Google Research (2025). Generative UI: LLMs are Effective UI Generators. Accompanying the dynamic view and AI Mode rollouts. Raters preferred generated interfaces "when ignoring generation speed."
Google Developers (2025). Introducing A2UI: an open project for agent-driven interfaces. Published December 2025.
CHI 2026 Workshop on Generative UI. Treats generative UI as complementary to ability-based and model-based approaches.

Explore the Autessa platform

#Why is it fair to ignore latency?

#Why do graphical interfaces still matter?

#What is the real axis behind UI personalization?

#What is a static UI?

#What does a per-company interface look like?

#How does a per-user interface stay personal and stable?

#What is a per-page-load interface?

#What is a per-interaction interface?

#What do people get wrong about personalized UI?

#What does the research say about whether generated UI is better?

#When does a machine-generated interface go wrong?

#What about accessibility when a machine generates the interface?

#Who fixes it when a generated interface is broken?

#Why is the right answer usually a hybrid?

#How does Autessa support this without prescribing it?

#What is the strategic takeaway?

#References