If you’re a company leader or IT decision-maker who’s been hearing all the buzz about generative AI and are ready to implement a large language model (LLM) chatbot for your employees or customers, you might be wondering how to launch it and the cost involved.
Enter DeepInfra, a new startup founded by former engineers of IMO Messenger. They aim to solve these questions by running these models on private servers for their clients, charging a significantly lower rate of $1 per million tokens compared to competitors like OpenAI’s GPT-4 Turbo at $10 per million tokens and Anthropic’s Claude 2 at $11.02 per million tokens.
DeepInfra has recently come out of stealth mode and announced it has raised $8 million in seed funding led by A.Capital and Felicis. The company plans to provide a variety of open-source model inferences, including Meta’s Llama 2 and CodeLlama, along with custom-tuned versions of these and other models.
The CEO, Nikola Borisov, emphasized in an interview that their focus is on providing CPUs and a cost-effective way to deploy trained machine learning models. He noted that while there’s a lot of focus on training models, running these models, or “inferencing,” also requires considerable computational power and resources.
Borisov explained that the challenge lies in fitting multiple concurrent users onto the same hardware and model. Each generated token demands significant computation and memory bandwidth, so optimizing user overlap is crucial to avoid redundant computational operations.
Borisov and his co-founders, who have extensive experience from their time at IMO Messenger, use their expertise in managing large server fleets with optimal connectivity to address this challenge.
According to Aydin Senkut from Felicis, who is one of DeepInfra’s backers, the team at DeepInfra has exceptional experience in building efficient infrastructure to serve vast numbers of users, which is a key reason for supporting the company. He believes DeepInfra’s efficiency in server infrastructure allows them to offer such competitive pricing, which could be a game-changer in the AI market.
DeepInfra is initially targeting small-to-medium-sized businesses (SMBs) known for being particularly cost-sensitive. The company plans to keep up with advancements in the open-source AI community, offering state-of-the-art models for various tasks, including text generation, summarization, computer vision, and coding.
Borisov anticipates a flourishing open-source ecosystem where more efficient and specialized models continue to emerge. He believes that open-source models like Llama will see widespread variants tailored for specific needs, reducing computation costs.
Importantly, DeepInfra’s service is designed with data privacy in mind, ensuring that user prompts are not stored but discarded once the session ends. This focus on privacy and cost-efficiency makes DeepInfra an attractive option for businesses looking to leverage AI technology affordably and securely.