Enhancing Apache Airflow: Astronomer’s Contribution to AI Data Orchestration

Enhancing Apache Airflow: Astronomer's Contribution to AI Data Orchestration

Subscribe to our daily and weekly newsletters for industry-leading AI updates and exclusive content.

Transferring data between different systems is usually handled by data orchestration tools. One of the most popular of these tools is Apache Airflow, an open-source technology initially created by Airbnb.

Today, Astronomer, the main commercial sponsor behind Apache Airflow, has launched a new update to its Astro platform. This update features enhanced support, security, and management for enterprises. Though Airflow initially focused on orchestrating data pipelines for data analytics and business intelligence, it is now increasingly used for AI and machine learning workloads.

Julian LaNeve, CTO at Astronomer, noted that Airflow is particularly effective for creating and running data pipelines. Airflow allows users to define pipelines as code, offering almost limitless possibilities based on what the code can do.

Airflow’s growing popularity stems from its ability to easily define, build, and deploy data pipelines. It integrates with major data platforms and cloud providers, including Snowflake, Databricks, AWS, Microsoft, and Google Cloud. LaNeve explained that while the open-source project is straightforward for a single team, managing it on an enterprise scale is more challenging. This is where Astronomer’s managed service for Apache Airflow comes in, adding extra capabilities beyond the core open-source technology.

Astronomer has developed a layer called the Astronomer runtime, which optimizes the performance of Airflow. Additionally, the Astro platform offers tools that simplify writing data pipelines. The Astro Cloud IDE, for instance, provides a notebook-based tool for this purpose. Astronomer is also working on observability tools to better understand data flow across ecosystems.

The latest Astro platform update includes several new features. One major improvement is in connection management, which provides a central point for governance, visibility, and security for data pipelines. Administrators can now define connections to platforms like Snowflake and Databricks directly within the Astro platform.

The update also simplifies the process of upgrading and rolling back data pipeline configurations. If a data pipeline fails, users can easily revert to a previous configuration. Additionally, the platform performs checks to ensure any updates are compatible and will function correctly.

Astronomer is increasingly focusing on AI workloads. Recently, the company announced integrations with various AI vendors like OpenAI, Cohere, Pinecone, OpenSearch, Weaviate, and pgvector. Astronomer has also created a reference architecture to help organizations build and deploy large language model (LLM) applications. The ask.astronomer.io application showcases this architecture by pulling documentation from over a dozen sources using a retrieval augmented generation (RAG) approach.

LaNeve emphasized that Astronomer’s tools are designed to reliably train AI models with the latest data. This reliability and up-to-date data training are exactly what Astronomer and Airflow aim to provide.