Orchestrating Intelligence: How Comprehensive Specialization Transforms AI
In the emerging AI ecosystem, even being a generalist becomes a specialized role.
A transformation is underway in AI architectures. The proliferation of “agentic” systems — with orchestrated models directing specialized agents — reveals something fundamental: AI systems are coordinating other AI systems, breaking computational constraints while transforming the pace and scope of AI development.1 This differentiation and orchestration of models aligns with the structure of knowledge itself.
Knowledge of the world, both semantic and procedural, has a geometry: a compact core of general principles and knowledge surrounded by increasingly specialized domains — each with a core and surrounding layers, and so on, forming a branching, perhaps exponential pattern of substantial depth.2 Attempting to capture all knowledge in a single model would face quadratic scaling costs on top of this steep growth — an impractical approach. The emerging solution mirrors knowledge itself: general cores that link and coordinate specialized components. Coordination and specialization may seem like a compromise, but it’s aligned with the deep semantic structure of knowledge itself, and with the economics of model scaling.
The Economics of Orchestration
The quadratic scaling of training costs with model size3 has striking implications. The computational budget for training one 500-billion-parameter model could instead produce 100 models of 50 billion parameters or (considering only computational costs) 10,000 models of 5 billion parameters. These smaller, specialized models deliver 10 to 100 times greater inference throughput per dollar while collectively providing 10 to 100 times greater information capacity. Small models can be surprisingly strong4 and can handle tasks defined and delegated by larger models with broader understanding — and can work concurrently.5
The opportunities offered by scaling and specialization are compelling and potentially transformative. The limited results to date reflect two constraints: non-computational costs of specialization, and non-computational costs of coordination.6
Specialization carries costs beyond computation. Creating effective specialized models requires domain expertise — identifying where specialization makes sense, curating training data, evaluating outputs, and recognizing limitations. This expertise lives in universities, hospitals, engineering departments, and research labs, not in the AI companies building foundation models. Domain experts rarely possess ML engineering skills, while developers in AI labs have largely focused on capabilities they can evaluate and use, notably software development. This mismatch has made scaling general models the path of least resistance. Training models to learn “everything” at once has proved unexpectedly powerful, and capital proves cheaper than time, knowledge, and human coordination.
This coordination barrier is now dissolving. Fine-tuning services let domain experts adapt foundation models without ML expertise. AI assistants guide specialists through dataset curation, evaluation design, and architecture selection. What once required months of collaboration between ML engineers and domain experts becomes a guided workflow within the expert community. As AI-assisted development gains power, limited fine-tuning will be extended to base-model training and even architectural innovation. Cardiologists create cardiology models that could, with assistance, mature into multimodal models that merge image data with modeling of electrophysiology. Materials scientists create materials models that could, with assistance, mature to become systems of models that interpret the literature in coordination with atomistic physical modeling. The democratization of model development unlocks knowledge previously blended into general-purpose systems or scattered in fragments across multiple research projects. This represents a high-order expression of the bypass principle — AI flowing around the bottleneck of centralized model development by enabling distributed specialization.
Model coordination has posed parallel challenges. Selecting specialists to invoke, maintaining context across boundaries, managing workflows — even where specialization proves practical, coordination costs have undercut its value. But as AI systems gain the ability to orchestrate other AI systems, both human and model coordination barriers fall together. What required special engineering skills becomes an AI task, creating compounding returns: better coordination enables specialization, which demands sophisticated coordination, which rewards deeper specialization.
Orchestration frameworks are a production reality. Major technology companies now deploy systems where AI models direct specialized agents through standardized protocols, while quantified efficiency gains — halving computational costs in some cases — drive rapid adoption. Development frameworks have democratized orchestration, enabling domain experts to build multi-model systems without engineering expertise. The architectural shift from monolithic to orchestrated isn’t gradual evolution but compressed transformation, with deployment timelines measured in months rather than years. These aren’t isolated experiments but coordinated infrastructure changes across the technology industry.7
Orchestrated systems for research…
While I was writing this post, a research group at Stanford presented Biomni, a system that uses a LLM to orchestrate 150 specialized biomedical tools, 105 software packages, and 59 databases. Some software packages wrap LLMs, vision models, or other products of deep learning. Note that Biomni was developed by a group with specialized knowledge, coordinating products of more deeply specialized knowledge, yielding aggregate capabilities far beyond those of any part. The Biomni system exposes an enormous surface for incremental upgrades, and an LLM-powered agent assisted in assembling the current software tools and data sources. (See “Biomni: A General-Purpose Biomedical AI Agent”, and the system access page.)
…and for developing new systems for research
A few days earlier, the notable Shanghai Artificial Intelligence Laboratory published “NovelSeek: When Agent Becomes the Scientist — Building Closed-Loop System from Hypothesis to Verification”, a multi-agent system that develops training methods and architectures for specific scientific tasks. This work suggests how systems for AI R&D automation can extend multi-agent systems like Biomni — and themselves. “RSI” should mean Recursive Systems Improvement, because intelligence isn’t a thing.
Deeper integration lies ahead. These text-based approaches hint at deeper possibilities: Models will communicate through latent-space channels — passing semantic vectors rather than token sequences. The technical infrastructure exists in research labs;8 costs, benefits, and technical maturity are tilting toward practical value.
Generality is a Specialty
In the emerging ecosystem of AI models, generality itself becomes a specialty. Large models don’t disappear — they evolve into innovators and orchestrators. They specialize in comprehension, context-holding, and high-level reasoning while smaller models handle domain depth and throughput.9
This pattern of generality and specialization mirrors familiar patterns in complex systems. In brains, higher brain centers coordinate with specialized sensory and motor cortex (integrated through a kind of latent-space coupling). CPUs orchestrate specialized processors. In economies, manufacturers of systems like computers and aircraft draw on supply chains that orchestrate the work of specialized producers. In each case, some components specialize in coordination itself — the prefrontal cortex, the CPU, the prime contractor. Large language models are filling a similar niche, a pattern that suggests AGI won’t emerge as a monolithic superintelligence, but as an ecosystem in which generality itself is a specialized role — coordinating rather than containing all capabilities.
Transformation Dynamics
The shift from monolithic to orchestrated architectures is transformative. When one model’s compute budget can instead train hundreds of specialists, computational constraints relax. When AI systems can assist in training specialists, human bottlenecks on aggregate scope relax. When AI systems can coordinate these specialists, new dynamics emerge: specialized models can be updated and adapted independently, enabling rapid iteration without system-wide retraining. AI systems optimized for research can accelerate AI development — including their own — through focused experimentation at increasing cadence.10
This architectural shift broadens participation while making capabilities more legible. Organizations with domain expertise but limited resources can contribute specialized models. Distributed architectures replace opaque monoliths with specialized components and defined interfaces, where critical capabilities can be isolated, monitored, and constrained. Diverse models from different groups can cross-check each other, adding robustness through multiplicity.11
As specialized AI components develop in parallel and integrate dynamically, barriers to transformative applications begin to dissolve.12 The detailed complexity of the world can increasingly be matched by AI systems adapted to specific tasks and domains, accelerating progress toward a hypercapable world. Yet this same modularity that promises acceleration also demands new thinking about coordination, safety, and the nature of intelligence itself.
The Architecture of Intelligence
The capabilities that made large models dominant — understanding context, managing abstractions, coordinating information — now enable them to orchestrate specialized systems that collectively surpass monolithic approaches. This architectural evolution reflects the structure of knowledge itself: specialized domains require specialized processing, while generalization becomes the specialty of coordination.
DeepSeek'’s founder captures this vision: “Our destination is AGI,” yet anticipates “specialized companies providing foundation models and services, achieving extensive specialization in every node.”13
Current text-based coordination will yield to latent-space communication.14 Human-designed workflows will give way to AI-designed architectures. As these advances converge, modular systems offer a critical advantage: specialized components remain comprehensible even when their collective capabilities (like those of society itself) exceed our understanding. AI Agency architectures can organize these resources for large, consequential tasks — humans conducting while AI systems perform — an approach that scales to superintelligent-level capabilities without requiring monolithic, potentially uncontrollable entities.15
Strategic Implications
These technical transformations — from monolithic to orchestrated, from centralized to distributed, from opaque to legible — reshape not just AI capabilities but the landscape of AI governance and safety.
For AI governance, distributed architectures offer both promise and challenge. Capabilities become more transparent when isolated in specialized modules, but coordination itself becomes critical: The modularity that supports safety through constrained, understandable components can also create new risks through novel combinations. A model specialized in orchestration could direct enormous AI resources while in itself having minimal capabilities.
This transformation compresses timelines. When existing models can coordinate new, specialized components, when AI R&D can iterate faster, when economics relax constraints on both training and inference, when AI can manage its own coordination — the path to transformative AI will shorten and bypass obstacles.
The implications ripple outward. Computational restrictions lose force as training costs fall and orchestration multiplies efficiency. Concerns about opaque, uncontrollable systems give way to architectures where capabilities are distributed, interfaces explicit, and risky capabilities need not be embodied in large models developed by highly visible actors. Achieving AI safety by constraining development becomes less promising; achieving AI security by strategic applications of AI becomes essential.
How can we guide the AI transition? Understanding the nature of prospective AI is essential for governance, yet even the the concept of “an AI system” can be misleading. Traditional regulatory approaches assume stable artifacts; orchestrated AI systems are better regarded as dynamic constellations of capabilities. Today’s emerging reality demands new frameworks for thought, policy, and action.
The term “agentic” has become fashionable but often obscures more than it illuminates. What matters isn’t that AI systems act as agents, but that they can coordinate — invoking tools, routing to specialists, managing workflows. This coordination capability, not agency per se, drives the transformation discussed here. “AI Safety Without Trusting AI” discusses how multiplying and specializing AI systems changes the AI safety problem.
The geometry of knowledge — general principles at the core, specialized domains expanding outward — explains why monolithic model scaling hits soft limits. The way forward involves external knowledge resources: retrieval-augmented generation, specialist models that mirror human knowledge communities, and prospective latent-space knowledge stores that bridge the gap between text-based search and expert delegation.
Quadratic scaling has a simple basis: When compute requirements grow with (parameters × training tokens), and optimal training scales tokens with parameters, and training cost scale roughly as (parameters × training tokens). Note that MoE LLM architectures reduce computation by a constant factor without changing how computation scales with size.
See, for example the open-weight Qwen3-8B, an LLM reasoning model with respectable performance in coding and mathematics, and the much stronger (reportedly useful) Qwen2.5 32B coder, or ether0, a 24B model that excels in chemistry tasks. Different classes of models differ widely in scale: The strongest multimodal LLMs contain hundreds of billions, even trillions, of parameters; the strongest models for vision (≲ 2B), image generation (≲ 10B), speech recognition (≲ 2B), robotic control (≲ 0.1B), and most (all?) other applications are small by comparison.
Advantages of concurrent processing can include lower latency, higher throughput, and (without sacrificing these) the ability to compare results from multiple trials and multiple models undertaking identical tasks.
Specialization by fine-tuning avoids some of these costs and can overlay specialized capabilities on models of any size. Fine-tuning is widely used to adapt stand-alone models to specific tasks when training data can be found.
Warning: out-of-distribution text follows.
“The scope and speed of deployment reveals the transformation underway. Microsoft’s Magentic-One orchestrates a GPT-4 controller with specialized agents for web browsing, file retrieval, coding, and terminal operations. Industry-wide adoption of protocols like Anthropic’s Model Context Protocol (Microsoft, Google, OpenAI) and Google’s Agent2Agent standard demonstrates architectural convergence. Quantified gains include FrugalGPT’s 58% cost reduction through cascading strategies and MIT’s Co-LLM accuracy improvements via specialist coupling. IBM’s Watsonx Orchestrate manages thousands of micro-AI skills in production, while frameworks like LangChain and Google’s Vertex AI Agent Engine enable rapid deployment by non-specialists. These aren’t vendor-specific experiments but a coordinated shift from monolithic to orchestrated architectures, with deployment timelines measured in months rather than years.”
Disclosure: This footnote reproduces, verbatim, a Claude-synthesis of an OpenAI Deep Research report examining 100+ sources. Yes, I’m violating the cardinal rule of “always verify LLM output.” My reasoning: even noisy signals can be informative when they consistently point in the same direction. The specific numbers might be wrong, the company names might be garbled, but the pattern—orchestration everywhere, standards emerging, efficiency gains driving adoption—rings true across sources. Don’t cite these as facts; do take seriously the overall impression of an industry rapidly reorganizing around orchestrated architectures. (Also, this article is about architectural evolution, not a June 2025 industry report. The trend matters more than the specifics.)
Further disclosure: I directed Claude to write the above disclosure based on a sketch of what I wanted to say. Claude ignored the “…also, I'm lazy” part of the prompt.
From “LLMs and Beyond: All Roads Lead to Latent Space”:
Fine-tuning and lightweight adapters can build latent-space connections between separately trained models. Here are some examples from recent work; OpenAI’s o3 model did the heavy lifting with light review and correction:
A subtle but crucial point: “asking the right questions” becomes more important than knowing answers. In an orchestrated architecture with externalized knowledge and skills, the orchestrator needs wisdom about what to ask or request, not encyclopedic knowledge or omni-competence. This has implications for AI safety: We can constrain what an orchestrator knows by filtering and rewriting training data, yet maintain (and constrain) system capabilities by giving the orchestrator selective access to external resources.
The acceleration dynamic deserves emphasis. Current AI development cycles are measured in months; cycle times for specialized model updates can shrink to days. When AI systems can identify needed specializations, create training data, train models, and integrate them into existing systems, the development loop could shorten from months to hours when performance metrics are good and risks are low.
The safety implications of orchestrated architectures extend beyond transparency. When capabilities are factored across specialists, we gain new intervention points: constraining orchestrator knowledge, limiting specialist interactions, monitoring coordination patterns. Distinct “cognitive” actions become more visible. This transforms AI safety from preventing deception by monolithic systems to managing information flows in distributed ones — a more tractable problem. See “AI Safety Without Trusting AI”.
This architectural shift connects directly to themes explored throughout this series. “The Platform: General Implementation Capacity” examined how AI expands our ability to create complex systems; orchestrated AI multiplies this capacity by enabling parallel development of a myriad of specialized capabilities. This suggests that timeline estimates based on monolithic scaling are apt to be dramatically wrong — the relevant metric isn’t how fast single models improve, but how fast ecosystems of coordinated specialists can emerge.
DeepSeek’s vision of AGI linked to specialization aligns with the views expressed here. For more context, see Liang Wenfeng’s interview in Chinese or in English translation.
“All Roads Lead to Latent Space” explores why communication between models will tend to shift from token sequences to latent representations. Humans think primarily in concepts yet must communicate in words; AI have no such constraint. The result is a qualitative shift in how intelligence can be organized.
I anticipated this architectural evolution in “Reframing Superintelligence: Comprehensive AI Services as General Intelligence” (FHI Technical Report, 2019), and proposed AI R&D automation as the key accelerator for system-level, asymptotically recursive self improvement. The report argued that general intelligence can be factored into coordinated services, and now we can see increasingly general services and coordination mechanisms emerging in concrete form. I had no idea of the potential strength and generality of LLMs (the report predated GPT-2!), but the broader picture remains intact: The structure of knowledge and skills at a civilizational scale will be reflected in the structure of any practical system that might embody them (for a deep and general analysis, see Herbert A. Simon, “The Architecture of Complexity”).