How Integrating Endpoint Data in LLM Training Enhances Cybersecurity

How Integrating Endpoint Data in LLM Training Enhances Cybersecurity

Subscribe to our daily and weekly newsletters for the latest updates and exclusive AI industry content.

Capturing weak signals across various endpoints and predicting potential intrusion patterns is an ideal challenge for Large Language Models (LLMs). The aim is to analyze attack data to uncover new threat patterns and correlations while enhancing LLMs and other models.

Top endpoint detection and response (EDR) and extended detection and response (XDR) companies are embracing this challenge. Nikesh Arora, chairman and CEO of Palo Alto Networks, emphasized that their XDR collects around 200 megabytes of data per endpoint, which is significantly more than what most industry participants gather. They use this raw data to enhance firewalls and apply automated attack surface management with XDR.

CrowdStrike’s co-founder and CEO, George Kurtz, highlighted at their annual Fal.Con event that they are pioneers in linking weak signals from different endpoints to detect novel threats. They are even extending this capability to third-party partners to identify weak signals across various domains for new detections.

XDR has been effective in reducing noise and improving signal quality. Leading XDR platforms include Broadcom, Cisco, CrowdStrike, Fortinet, Microsoft, Palo Alto Networks, SentinelOne, Sophos, TEHTRIS, Trend Micro, and VMware.

Why LLMs are the future of endpoint security
The future of endpoint security involves enhancing LLMs with telemetry and human-annotated data. According to the latest Gartner Hype Cycle for Endpoint Security, innovations in this field focus on faster, automated threat detection, prevention, and remediation, utilizing integrated XDR to correlate data from various sources like endpoints, networks, web, email, and identity solutions.

Investment in EDR and XDR is outpacing the broader information security and risk management market, leading to increased competition among vendors. Gartner projects the endpoint protection platform market to grow from $14.45 billion today to $26.95 billion in 2027, with a compound annual growth rate (CAGR) of 16.8%. Similarly, the global information security and risk management market is expected to grow from $164 billion in 2022 to $287 billion in 2027, achieving an 11% CAGR.

CrowdStrike’s CTO on strengthening cybersecurity with LLMs
VentureBeat recently interviewed Elia Zaitsev, CTO of CrowdStrike, to discuss how training LLMs with endpoint data can enhance cybersecurity. Zaitsev explained that when CrowdStrike was founded, it was designed as a cloud-native company to leverage AI and ML for solving complex customer problems. They envisioned using cloud technology to gather vast amounts of information, train classifiers in the cloud, and deploy them at the edge to make intelligent decisions.

Zaitsev emphasized that LLMs and generative AI tools are not intended to replace cybersecurity professionals but to augment their capabilities. The aim is to use AI to handle routine tasks, allowing human experts to focus on more complex issues. He pointed out that adversaries will also use AI to automate basic threats, creating an arms race where savvy human defenders will always be needed to counter smart human attackers.

Zaitsev also shared that the company has found it more effective to train multiple small LLMs on specific use cases rather than relying on one large, generalized model. This approach reduces errors and increases accuracy. They use a method known as “mixture of experts,” where several specialized LLMs collaborate, outperforming single, monolithic models.

To ensure trust in their AI systems, CrowdStrike validates the output of LLMs by cross-referencing it with their platform’s telemetry and API data. This validation process roots AI decisions in a foundation of truth.

Expert human teams play a crucial role in developing and training AI systems. According to Zaitsev, having high-quality, human-annotated datasets is essential for teaching AI models to perform specific tasks effectively. This approach has allowed CrowdStrike to build a valuable repository of data for creating generative AI models tailored to cybersecurity.

Advances in LLM training are paying off in the development of current and future products. CrowdStrike uses a multi-modal system that combines various LLMs with non-LLM technologies to perform tasks accurately. For example, their LLMs might summarize vulnerability data for laypeople without directly exposing user-specific data, thus maintaining privacy.

Staying updated on the latest developments in AI and cybersecurity is crucial. Subscribe to our newsletter for daily insights directly to your inbox.