Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage.
Writer, a three-year-old startup based in San Francisco, raised $100 million in September 2023 to expand the reach of its proprietary, enterprise-focused large language models. Although it doesn’t grab headlines as frequently as OpenAI, Anthropic, or Meta, or even other popular startups like Mistral AI from France, Writer’s suite of in-house LLMs, known as Palmyra, is making waves in the enterprise market. Companies such as Accenture, Vanguard, Hubspot, and Pinterest are among the clients using Writer’s creativity and productivity platform, which leverages Palmyra models.
Stanford HAI’s Center for Research on Foundation Models recently added new models to their benchmarking and introduced a new benchmark called HELM Lite, which includes in-context learning. This type of learning allows LLMs to pick up new tasks from a small set of examples given in the prompt at inference time.
Writer’s LLMs performed surprisingly well on this AI benchmark. While GPT-4 topped the leaderboard, Palmyra’s X V2 and X V3 models also excelled, despite being smaller. Palmyra notably led in machine translation. Writer’s CEO, May Habib, highlighted in a LinkedIn post that Palmyra X outperformed in multiple benchmarks, not just the traditional ones but also new translation benchmarks, where it ranked number one.
During an interview with VentureBeat, Habib emphasized the economic challenges enterprises face when trying to run a model like GPT-4, which is trained on 1.2 trillion tokens. She argued that generative AI use cases need to be economically viable. She noted that often enterprises build applications on a GPT model, only to find that after a few months, the prompts no longer work effectively because the model has been refined to reduce serving costs. She pointed out that according to the HELM Lite benchmark, GPT-4 (0613) is rate-limited and likely to be distilled, while GPT-Turbo’s future plans remain uncertain.
Habib praised Stanford HAI’s benchmarking efforts as more aligned with real enterprise use cases compared to other platforms like Hugging Face. She believes Stanford’s scenarios mirror actual utilization more closely.
Writer began as a tool for marketing teams, co-founded by Habib and Waseem AlShikh in mid-2020. Previously, they had run another NLP and machine translation company called Qordoba, founded in 2015. In February 2023, Writer launched three versions of Palmyra: Small with 128 million parameters, Base with 5 billion, and Large with 20 billion. In May 2023, aiming at the enterprise market, Writer introduced Knowledge Graph, enabling businesses to connect data sources to Palmyra and allowing customers to self-host models.
“Our full stack includes the model plus a built-in RAG solution,” explained Habib. She emphasized the importance of AI guardrails on the application layer alongside the built-in RAG solution. Users are tired of sending all their data to an embeddings model only to have it routed through a vector database. Writer’s new graph-based RAG approach aims to build digital assistants that are grounded in a customer’s data.
Regarding the size of LLMs, Habib believes in the value of smaller models with curated training data and updated datasets for enterprises. She responded to a LinkedIn post by Wharton professor Ethan Mollick, who shared a paper about BloombergGPT, stating that generalist models can excel in specialized topics. Habib countered that the HELM Lite leaderboard shows medical LLM models outperforming GPT-4. Once beyond the state-of-the-art threshold, factors like inference and cost become crucial for enterprises. A specialized model is easier to manage and more cost-effective to run.