Join our newsletters for the latest updates and exclusive content on AI advancements.
Late nights with a newborn can sometimes lead to unexpected ideas. That’s what happened to Josh Bickett, a developer at OthersideAI. While feeding his daughter in the middle of the night, he was struck with an idea for a new “self-operating computer framework.”
In an interview with VentureBeat, Bickett shared his experience of caring for his four-week-old daughter and having moments of inspiration about AI. His latest project, influenced by demos of GPT-4 vision, came to life during one of those quiet, late-night feedings. With his baby in one arm, he started sketching the idea on his computer. Although the initial implementation wasn’t perfect, it marked the beginning of defining a new way for computers to operate themselves.
When Matt Shumer, OthersideAI co-founder and CEO, saw Bickett’s framework, he immediately recognized its potential. He compared its significance to the development of self-driving cars but for computers, emphasizing the importance of building the intelligence needed for such a system.
As Bickett explained, the framework allows AI to control both mouse clicks and keyboard inputs based on visual data, functioning similarly to a human. Shumer pointed out that this approach is a significant upgrade from methods that rely solely on APIs, which can’t handle every task. Allowing the AI to work visually, like a person, opens up broader possibilities for automation.
The framework operates by taking screenshots as input and then outputting mouse clicks and keyboard commands. While the current version is basic, the real potential will be unlocked by integrating more advanced vision and reasoning models. Bickett noted that as better models are developed, they can be easily plugged into the framework to enhance its capabilities.
Looking ahead, Shumer envisions a future where this technology will revolutionize how we interact with computers. He foresees specialized AI models handling various tasks, ranging from simple to complex, and designed for different users, from enterprises to consumers. The ultimate goal is to create agents that make computing so intuitive that even those with minimal computer skills can use them effortlessly.
Bickett believes that making the framework open source will accelerate its development, allowing global developers to innovate further. Shumer agrees, predicting that this approach will open up opportunities for many players, models, and applications in the industry.
Developing truly intelligent computer agents will require significant resources. Imbue, formerly Generally Intelligent, has partnered with Dell to create a powerful AI training platform, featuring a massive cluster of Nvidia GPUs. This setup aims to enhance foundation models for reasoning, which is essential for creating effective AI agents.
Imbue’s approach focuses on developing AI capable of human-like reasoning, handling complex decisions, and adapting to real-world situations. Their work combines foundation model training, experimental prototyping, robust toolkit creation, and theoretical research, all aimed at achieving advanced AI.
While the self-operating computer framework is an important first step, Bickett and Shumer see it paving the way for a future where AI agents replace traditional computer interfaces. Though late-night ideas may spark innovation, realizing the full potential of intuitive, language-based computing will require dedicated effort and continued advancements.