How to harness powerful AI
We can apply powerful, steerable AI capabilities to large, consequential tasks, with no need for powerful, autonomous agents. State-of-the-art AI points the way.
Should we consider how to superintelligent-level AI capabilities1 to large, consequential tasks? Some would say: “No, AI will never be strong enough”. Others would say: “No, a superintelligent AI [agent] would do whatever it wants”. Let’s consider a third, more actionable expectation: that we will be able to apply superintelligent-level AI resources — enough AI to accomplish large, consequential tasks — and should consider how to apply these capabilities to achieve broadly appealing outcomes.
But how could we apply highly capable AI to large, consequential tasks — even transformative tasks — without losing control? Here, I’ll describe a general approach: organizing tasks in AI-agency role architectures.2 I think this approach is the second kind of obvious.3
In discussing prospects, I will assume that AI developers4 continue to produce flawed models that usually do what we want (at least approximately), and that can in aggregate perform an ever-wider range of tasks.
The AI agency concept:
People make tea by acting on a familiar plan that requires little thought, but NASA flew humans to the Moon by acting on an unprecedented plan of a depth and complexity that no single human could ever fully understand. Human-like AI agents could readily make tea, but how could AI resources be harnessed to implement something on the scale of a space program without employing a risky superhuman agent?5
To see what makes sense, let’s consider how large tasks get done in the world today, a process that can be chunked into planning, action, and correction.
Safer planning through competing proposals
Plans are enacted by agents, but generating plans is a task for a generative model.6 High-level planning, not task performance, is where we need strong, general intelligence.
Proposing plans
Generative AI begins with inputs that may include a question or a description, and perhaps sketches or examples of what we want.7 By varying inputs (including random-number seeds), we can use models to generate streams of images, prose, or Python code, typically discarding most or all of the outputs. Some generative models are compound AI systems, conversational models with access to tools like DALL-E 3.
A superintelligent-level planning model would have access to multiple, expert generative models — models that produce not just images, but engineering designs, system architectures, and implementation plans with schedules, budgets, and so on. Advanced planning models can employ domain-expert models to refine and evaluate sub-plans, and can evolve plans through variation and selection.
Researchers have found that, with division of labor, there is no need for a monolithic, omni-competent model:
State-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models….Compound AI systems will likely be the best way to maximize AI results in the future, and might be one of the most impactful trends in AI in 2024.
— The Shift from Models to Compound AI Systems (Berkeley AI Research)
Generating large, consequential plans
When I use a generative model, I often iterate prompts and sometimes use one model to evaluate or revise the output of another. Humans using generative models to produce consequential plans will likewise gather multiple proposals, evaluate, and iterate, and we should expect people to use multiple models for these tasks.
Workflows can include models that analyze, critique, and explain plans, or reject plans for lack of clarity. Models that consult models of human preferences can discard obviously inferior plans, and they can draw our attention to competing alternatives when we don’t know our preferences until we’ve compared our options.
After deep consideration, experts have concluded that it would be difficult or impossible to build a machine that can reliably predict human preferences in every circumstance. Fortunately, this isn’t necessary. Unfortunately, human preferences often conflict, but this is a different concern.8
Can superintelligent-level planning be transparent and interpretable? There are two distinct questions here, one about plans, and the other about models that propose plans. It’s a good bet that effective models will be substantially opaque, but it’s also a good bet that effective plans can be transparent — indeed, clear description and analysis can be made a criterion for their acceptance.
But can we trust “the AI”?
But what if “the AI” tries to fool us?
That’s the wrong question: “The AI” plays no role in the architecture. Instead, there are models playing many roles,9 some generating plans, some generating criticisms, all implicitly competing10 to perform their tasks well, promptly, and at minimal cost.
Note that the implicit goal of a planning model is to propose plans that would achieve our goals.11 Basic model goals are ingrained: In every application, and in every stage of development, models that produce less satisfactory results are discarded in favor of models that work better. Darwin is on our side here.
I worry more about AI that helps humans get what they want.12
Safer actions through bounded tasks
In the real world, plans for large, consequential tasks have features that are seldom mentioned in the AI safety literature, features like documentation, budgets, schedules, auditing, reporting, regulatory compliance, contingency plans, and mechanisms for ongoing review and revision in light of experience. A typical large, consequential task calls for coordinating agents and actions of many kinds, and so it includes extensive delegation and division of labor.13
Tasks, as generally understood, have bounded goals to be accomplished in bounded time and with bounded resources, simply because people want results without too much cost and delay. Large tasks are composed of smaller tasks: welding a seam to construct a tank to build a booster to launch a spacecraft on a mission to land a crew on the Moon. Coordinating bounded tasks is a bounded task, and tasks can be changed and repeated as necessary.
With enough AI capacity, tasks of all kinds and scales could be automated and performed under ongoing adaptive guidance. And in a large project, some subtasks may themselves be large, consequential, and novel, calling for rounds of finer-grained planning, review, and so on. Its tasks and planning all the way down to the level of simple actions.
Applying POLA
Organizing work into bounded tasks aligns with a fundamental principle of computer security: POLA, the principle of least authority.14 The POLA approach gives each entity only the capabilities that it needs to perform its task. That way, when something goes wrong, damage is limited.
From this perspective, giving boundless capabilities to agents would be perverse, risking misuse of authority whether accidental or not. Intelligent machines can propose better, safer plans for organizing work.15
Safer results through ongoing correction
Monitoring and reporting progress, tracking anomalies, responding to human feedback: All these are (or should be) part of any plan for a consequential task. In the usual AI-agent story, “corrigibility” is a fundamental concern because a willful AI agent might resist attempts to change its goals. But this problem need not arise when agents perform bounded tasks and plans include plans for revising plans.
The concept condensed:
AI systems can perform multiple coordinated roles.
AI systems can help humans implement AI role-architectures.
Agency role-architectures include planning, action, and correction.
In large, consequential tasks, planning and action are distinct roles.
Planning is a task for generative models.
Models and proposals implicitly compete for approval.
AI advisors and critics help clarify proposals and highlight problems.
Large-scale plans include schedules, budgets, and oversight.
Large-scale plans employ agents to perform specific tasks.
Agents aim to complete their tasks on schedule and within budget.
Models can assist in monitoring and evaluating results.
Practical plans include plans for correcting plans based on results.
The agency framework scales to superintelligent-level AI.
Not all tasks call for human deliberation and oversight.
Routine and low-impact tasks can be performed more directly.
Agency architectures do not eliminate safety concerns:
Catastrophic AI risks remain real.
The concept diagrammed:
Large, consequential tasks are apt to be complex: Plans include sub-plans, tasks include sub-tasks, and feedback runs in loops through multiple channels. That said, here is a rough schematic diagram of functions and relationships in an AI-agency role architecture:
TL;DR: We can apply powerful, general AI resources to large, consequential tasks without employing powerful, general AI agents or sacrificing capabilities.
By “superintelligent-level AI capabilities” I mean AI capabilities that exceed human capabilities over a more-or-less comprehensive range of practical tasks. For present purposes, this means enough AI (eventually) to greatly expand general implementation capacity.
Note that an “agency” might act under the authority of a corporation, government, or NGO, but the agency pattern could be applied to any project worth human attention and guidance.
For a longer and deeper exploration of fundamental considerations, see “Reframing Superintelligence” (FHI Technical Report 2019-1), which according to Scott Alexander is one of the studies “that’s aged the best in our current LLM era.”
The more specific AI-agency concept dates from my post, “The Open Agency Model”, in the AI Alignment Forum. Since then, the AI agency concept has inspired the Atlas Computing initiative (pdf) and a research project (pdf) proposed by ARIA, the UK’s answer to DARPA.
Meaning “obvious once pointed out”.
Developers will increasingly be aided by AI tools (as we’re already seeing), a process that leads toward asymptotically-recursive improvement of the AI technology base.
In this series of publications, I focus on a potential future situation in which we have enough AI capacity to greatly facilitate large, consequential tasks. The question addressed here is how to apply that capacity with effective human guidance.
I find the term “model” helpful because it anchors discussion in today’s reality rather than legacy concepts of AI systems as creatures. What we see today is not the AI we’d been looking for.
Literal sketches and examples for image and text models, relatively abstract sketches and examples for planning models in the future. The concept of “inpainting” would correspond to filling a gap in an incomplete plan.
Questions of AI-facilitated deliberation and governance are beyond the scope of this article. What agency architectures offer is structural opportunities for deliberation and governance.
We already see a rapidly growing number of substantially different state-of-the-art AI models. But in addition, through iterated, task-specific prompting, even a single model can play multiple functionally independent roles in an architecture.
In the orthodox AI Doomer belief system, highly capable AI systems — though shaped by competition for human approval and performing different tasks — will inevitably collude with their competitors and decide to destroy us. For a dissection of this view from a game-theory perspective, see “Applying superintelligence without collusion”.
A powerful agent with world-changing goals would be something quite different from a planning model: Proposals can be discarded, world outcomes cannot.
Much of the AI safety literature has assumed a different approach: that humans would (should?) delegate large, consequential tasks to an omni-competent agent that has internal goals and no external constraints. The hope has been that we could build an agent that would, voluntarily and reliably, limit its actions and do what we want. How one might accomplish this, however, remains an open question.
For a deep and brilliant analysis of POLA and its generality, see “The Structure of Authority: Why Security Is not a Separable Concern” (pdf). Full disclosure: I know the authors.
But would division of labor be too difficult to organize? It seems strange to anticipate superhuman AI and then argue that division of labor would be a burden for humans. We can use AI to architect roles for AI.