Crafting the Ideal Gen AI Data Layer: Lessons from Intuit

Crafting the Ideal Gen AI Data Layer: Lessons from Intuit

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage.

In VentureBeat’s reporting on generative AI, one company has particularly stood out for its speed and capability in deploying the technology at scale within the enterprise sector—Intuit. In September, Intuit introduced an LLM-powered assistant called Intuit Assist across all of its products, including TurboTax, QuickBooks, Credit Karma, and MailChimp. Earlier in June, it announced its own Gen AI operating system, which coordinates LLM activities across the entire company—a vision that likely materialized ahead of other major companies.

I recently spoke with Alon Amit, Intuit’s VP of Product Management, about a crucial aspect of achieving success with Gen AI: building a robust data management layer. Amit explained that Intuit spent several years refining this data layer to ensure that the data is well-integrated, accurate, properly governed, and not replicated. Only after this groundwork was in place could LLMs leverage the data for personalized interactions with Intuit’s 100 million small business and consumer customers.

During our conversation, Amit shared a slide illustrating Intuit’s data layer, demonstrating their best practices in data management. If you’re a data leader in an enterprise, I highly recommend watching the video linked above, where Amit details the key areas the company is focusing on, including improvements they aim to make in 2024.

Here are some key takeaways from Amit’s insights:

1. The Data Map Registry: Intuit created a comprehensive repository for all data assets, both real-time and batch, produced within the company. This registry includes all data schemas and ensures proper governance, including clearly defined ownership and purposes for each asset. Amit acknowledged that while this process isn’t perfect yet, Intuit expects to get very close to perfection by the end of next year.

2. Culture of Valuing “Data as a Product”: Leveraging the data map, Intuit has cultivated a culture where developers, product managers, engineers, and others see any generated data as a product, even beyond what’s included in customer-facing products.

3. Uniform Governance of Data Schema Changes: Any changes to data schemas, whether for click-stream data or third-party data flowing into Intuit’s ecosystem, are uniformly governed to prevent disruptions in downstream data systems, which are crucial for supporting generative AI. This process includes events like when developers create a real-time data bus, all of which is automatically populated within Intuit’s data lake.

4. Governed Data Derivation: Derivation refers to any transformation of data beyond its source, such as analytics computations, AI model feature extraction, and marketing campaign attributes. When a developer attempts to derive an already existing feature, the system notifies them to prevent duplication.

5. Real-time Data Derivation: Slated for development in 2024, this initiative focuses on creating “real-time paved paths for data derivation.” This will enable developers to ensure that when a customer asks a question or an expert provides support, Intuit can track the user’s actions almost in real time.

Stay informed with the latest news delivered daily to your inbox by subscribing to our newsletters.