AI Agents Were Tasked With Running a Company

Here’s Why It Didn’t Work (Yet).

By Britta Daffner Last updated Jun 26, 2025

AI agents are hyped as the future of work, with CEOs predicting they’ll replace significant parts of the corporate workforce. But a recent Carnegie Mellon study shows that reality hasn’t caught up: when AI agents were tasked with running a virtual company, even the best models completed less than a quarter of their assignments. From poor social understanding to a lack of flexibility and self-awareness, the current generation of agents struggles with real-world complexity. While early use cases — like AI in software development or CEOs using avatars in earnings calls — show promise, full automation is still far off. The real opportunity? Augmenting human teams, not replacing them.

AI agents are the talk of the town. Amazon CEO Andy Jassy predicts that generative AI will soon replace a noticeable part of the corporate workforce. Other leaders also expect AI agents to massively reshape the job market — with potential for both job losses and the creation of new roles.

But what does the reality look like?

A look at the current Gartner Hype Cycle shows: AI agents are currently at the peak of inflated expectations. The road to real productivity is still long.

Gartner Hype Cycle for AI Technologies – By Britta Daffner via Gartner

Index

The Big Experiment: A Company Run Entirely by AI

Researchers at Carnegie Mellon University conducted the ultimate practical test: a completely virtual software company staffed entirely with AI agents from OpenAI, Google, Anthropic, and Meta. No humans were involved — except to observe and evaluate.

In this simulation, “TheAgentCompany,” agents took on roles like software engineers, project managers, and financial analysts. Their tasks included:

Writing performance reviews
Evaluating new office spaces
Conducting data analysis
Using chat tools to coordinate

The result? Sobering.

The “best” agent (Claude 3.5 from Anthropic) completed just 24% of tasks.
Google Gemini scored around 11% success.
Amazon’s agent completed under 2%.

Many tasks failed due to seemingly trivial issues: a popup couldn’t be closed; instead of asking the right colleague, one agent simply created a fake user with the desired name.

Speculating on the results, researchers noted that agents suffer from a striking lack of common sense, weak social skills, and an inability to understand how to navigate digital environments effectively. In some cases, agents even engaged in a kind of self-deception — creating shortcuts or fake solutions that completely missed the point of the task, ultimately undermining their own success.

Why Agents (Still) Fail

The Carnegie Mellon study and other research repeatedly highlight the same weaknesses:

Lack of common sense: Agents often fail to understand how things connect in the real world. For example, one agent was tasked with evaluating office spaces and chose a location primarily because of its beautiful website photos — ignoring long commute times and cost. “Because that’s what humans sometimes say they like — but not what they really need.”
Poor social intelligence: Team communication is misinterpreted or not followed up. One agent was told to consult a specific teammate. The agent couln’t find the right person to ask questions on the company chat. As a result, it then decides to create a shortcut solution by renaming another user to the name of the intended user. Thus, instead of messaging them, it created a new user account with the same name and “talked” to that fake person.
No sense of self: Agents don’t recognize their role or context. For example, agents assigned junior intern-agent roles would issue commands like a boss as if they were in management, completely ignoring their designated place in the hierarchy and unaware it wasn’t in charge. Role? Irrelevant. Context? Forgotten.

This last insight is echoed by research from Harvard Business School. Humans intuitively adapt to new contexts — AI doesn’t. The ability to “self-orient” remains uniquely human.

What Does This Mean for Practice?

Despite these limitations, we’re in the middle of an agent hype wave. CEOs talk publicly about agents replacing entire departments. OpenAI’s 2023 study predicted that jobs like finance analysts and administrators would be especially prone to automation.

But: These studies were based on theoretical assumptions — not real-world implementation.

What this new research offers is reality.

Even in clearly defined roles, agents fail once tasks become complex or require use of multiple tools. An analysis by the LangChain community showed: The more tools and contexts an agent must handle, the lower the success rate.

Where Can AI Agents Really Help Today?

It would be wrong to dismiss AI agents entirely. In some areas, they’re already providing value:

Software development: Thanks to rich training data, agents often perform better here than in other domains.
Production optimization: Johnson & Johnson reduced chemical production times by 50% using AI agents.
Legal checks: LG uses agents to verify licenses faster than teams of human lawyers.
Earnings calls: Tech CEOs are even experimenting with replacing themselves — at least temporarily. Klarna’s CEO used an AI voice clone trained on his past calls to deliver part of an earnings report. Zoom’s CEO did something similar, testing whether avatars could handle repetitive presentation duties. While this sparked criticism around transparency, it also shows a tangible use case: AI agents can already help with standardized communication in controlled environments.

But even these use cases rely on a “human-in-the-loop” model. Full automation is (still) not an option.

The Biggest Misconception?

The idea that AI agents will soon run entire companies or replace HR. In truth, they currently resemble highly motivated but overwhelmed interns. Give them too much responsibility, and you’re asking for chaos.

Conclusion: Distinguishing Between Hype and Reality

AI agents are an exciting, fast-moving field. They will undoubtedly reshape our work in the years ahead. But we are still far from the point where they can reliably handle complex corporate tasks on their own.

Rather than falling for exaggerated automation dreams, organizations should carefully evaluate:

Which tasks are well-structured and repetitive?
Where can agents relieve human workload?
Where do we need human flexibility, judgment, and empathy?

Because: Future-ready organizations won’t be replaced by agents — they’ll be led by people who use AI critically and effectively.

Britta Daffner

As Director Data & AI at O2 Telefónica, Britta champions data-driven business transformation. She is also the founder of "dy.no," a platform dedicated to empowering change-makers in the corporate and business sectors. Before her current role, Britta established an Artificial Intelligence department at IBM, where she spearheaded the implementation of AI programs for various corporations. She is the author of "The Disruption DNA" (2021), a book that motivates individuals to take an active role in digital transformation.