Meta Launches LlamaFirewall Framework to Stop AI Jailbreaks, Injections, and Insecure Code

Avatar
Meta on Tuesday announced LlamaFirewall, an open-source framework designed to secure artificial intelligence (AI) systems against emerging cyber risks such as prompt injection, jailbreaks, and insecure code, among others. The framework, the company said, incorporates three guardrails, including PromptGuard 2, Agent Alignment Checks, and CodeShield. PromptGuard 2 is designed to detect direct

Meta on Tuesday announced LlamaFirewall, an open-source framework designed to secure artificial intelligence (AI) systems against emerging cyber risks such as prompt injection, jailbreaks, and insecure code, among others.

The framework, the company said, incorporates three guardrails, including PromptGuard 2, Agent Alignment Checks, and CodeShield.

PromptGuard 2 is designed to detect direct jailbreak and prompt injection attempts in real-time, while Agent Alignment Checks is capable of inspecting agent reasoning for possible goal hijacking and indirect prompt injection scenarios.

CodeShield refers to an online static analysis engine that seeks to prevent the generation of insecure or dangerous code by AI agents.

“LlamaFirewall is built to serve as a flexible, real-time guardrail framework for securing LLM-powered applications,” the company said in a GitHub description of the project.

“Its architecture is modular, enabling security teams and developers to compose layered defenses that span from raw input ingestion to final output actions – across simple chat models and complex autonomous agents.”

Alongside LlamaFirewall, Meta has made available updated versions of LlamaGuard and CyberSecEval to better detect various common types of violating content and measure the defensive cybersecurity capabilities of AI systems, respectively.

CyberSecEval 4 also includes a new benchmark called AutoPatchBench, which is engineered to evaluate the ability of a large language model (LLM) agent to automatically repair a wide range of C/C++ vulnerabilities identified through fuzzing, an approach known as AI-powered patching.

“AutoPatchBench provides a standardized evaluation framework for assessing the effectiveness of AI-assisted vulnerability repair tools,” the company said. “This benchmark aims to facilitate a comprehensive understanding of the capabilities and limitations of various AI-driven approaches to repairing fuzzing-found bugs.”

Lastly, Meta has launched a new program dubbed Llama for Defenders to help partner organizations and AI developers access open, early-access, and closed AI solutions to address specific security challenges, such as detecting AI-generated content used in scams, fraud, and phishing attacks.

The announcements come as WhatsApp previewed a new technology called Private Processing to allow users to harness AI features without compromising their privacy by offloading the requests to a secure, confidential environment.

“We’re working with the security community to audit and improve our architecture and will continue to build and strengthen Private Processing in the open, in collaboration with researchers, before we launch it in product,” Meta said.

Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.

 The Hacker News 

Total
0
Shares
Previous Post

Indian Court Orders Action to Block Proton Mail Over AI Deepfake Abuse Allegations

Next Post

RansomHub Went Dark April 1; Affiliates Fled to Qilin, DragonForce Claimed Control

Related Posts

North Korean Hackers Deploy BeaverTail Malware via 11 Malicious npm Packages

The North Korean threat actors behind the ongoing Contagious Interview campaign are spreading their tentacles on the npm ecosystem by publishing more malicious packages that deliver the BeaverTail malware, as well as a new remote access trojan (RAT) loader. "These latest samples employ hexadecimal string encoding to evade automated detection systems and manual code audits, signaling a variation
Avatar
Read More

WhatsApp Adds Advanced Chat Privacy to Blocks Chat Exports and Auto-Downloads

WhatsApp has introduced an extra layer of privacy called Advanced Chat Privacy that allows users to block participants from sharing the contents of a conversation in traditional chats and groups. "This new setting available in both chats and groups helps prevent others from taking content outside of WhatsApp for when you may want extra privacy," WhatsApp said in a statement. The optional feature
Avatar
Read More

Germany Shuts Down eXch Over $1.9B Laundering, Seizes €34M in Crypto and 8TB of Data

Germany's Federal Criminal Police Office (aka Bundeskriminalamt or BKA) has seized the online infrastructure and shutdown linked to the eXch cryptocurrency exchange over allegations of money laundering and operating a criminal trading platform. The operation was carried out on April 30, 2025, authorities said, adding they also confiscated 8 terabytes worth of data and cryptocurrency assets
Avatar
Read More