Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks

Cybersecurity researchers have uncovered critical remote code execution vulnerabilities impacting major artificial intelligence (AI) inference engines, including those from Meta, Nvidia, Microsoft, and open-source PyTorch projects such as vLLM and SGLang. “These vulnerabilities all traced back to the same root cause: the overlooked unsafe use of ZeroMQ (ZMQ) and Python’s pickle deserialization,”
[[{“value”:”

Cybersecurity researchers have uncovered critical remote code execution vulnerabilities impacting major artificial intelligence (AI) inference engines, including those from Meta, Nvidia, Microsoft, and open-source PyTorch projects such as vLLM and SGLang.

“These vulnerabilities all traced back to the same root cause: the overlooked unsafe use of ZeroMQ (ZMQ) and Python’s pickle deserialization,” Oligo Security researcher Avi Lumelsky said in a report published Thursday.

At its core, the issue stems from what has been described as a pattern called ShadowMQ, in which the insecure deserialization logic has propagated to several projects as a result of code reuse.

The root cause is a vulnerability in Meta’s Llama large language model (LLM) framework (CVE-2024-50050, CVSS score: 6.3/9.3) that was patched by the company last October. Specifically, it involved the use of ZeroMQ’s recv_pyobj() method to deserialize incoming data using Python’s pickle module.

This, coupled with the fact that the framework exposed the ZeroMQ socket over the network, opened the door to a scenario where an attacker can execute arbitrary code by sending malicious data for deserialization. The issue has also been addressed in the pyzmq Python library.

DFIR Retainer Services

Oligo has since discovered the same pattern recurring in other inference frameworks, such as NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang.

“All contained nearly identical unsafe patterns: pickle deserialization over unauthenticated ZMQ TCP sockets,” Lumelsky said. “Different maintainers and projects maintained by different companies – all made the same mistake.”

Tracing the origins of the problem, Oligo found that in at least a few cases, it was the result of a direct copy-paste of code. For example, the vulnerable file in SGLang says it’s adapted by vLLM, while Modular Max Server has borrowed the same logic from both vLLM and SGLang, effectively perpetuating the same flaw across codebases.

The issues have been assigned the following identifiers –

  • CVE-2025-30165 (CVSS score: 8.0) – vLLM (While the issue is not fixed, it has been addressed by switching to the V1 engine by default)
  • CVE-2025-23254 (CVSS score: 8.8) – NVIDIA TensorRT-LLM (Fixed in version 0.18.2)
  • CVE-2025-60455 (CVSS score: N/A) – Modular Max Server (Fixed)
  • Sarathi-Serve (Remains unpatched)
  • SGLang (Implemented incomplete fixes)

With inference engines acting as a crucial component within AI infrastructures, a successful compromise of a single node could permit an attacker to execute arbitrary code on the cluster, escalate privileges, conduct model theft, and even drop malicious payloads like cryptocurrency miners for financial gain.

“Projects are moving at incredible speed, and it’s common to borrow architectural components from peers,” Lumelsky said. “But when code reuse includes unsafe patterns, the consequences ripple outward fast.”

The disclosure comes as a new report from AI security platform Knostic has found that it’s possible to compromise Cursor’s new built-in browser via JavaScript injection techniques, not to mention leverage a malicious extension to facilitate JavaScript injection in order to take control of the developer workstation.

CIS Build Kits

The first attack involves registering a rogue local Model Context Protocol (MCP) server that bypasses Cursor’s controls to allow an attacker to replace the login pages within the browser with a bogus page that harvests credentials and exfiltrates them to a remote server under their control.

“Once a user downloaded the MCP server and ran it, using an mcp.json file within Cursor, it injected code into Cursor’s browser that led the user to a fake login page, which stole their credentials and sent them to a remote server,” security researcher Dor Munis said.

Given that the AI-powered source code editor is essentially a fork of Visual Studio Code, a bad actor could also craft a malicious extension to inject JavaScript into the running IDE to execute arbitrary actions, including marking harmless Open VSX extensions as “malicious.”

“JavaScript running inside the Node.js interpreter, whether introduced by an extension, an MCP server, or a poisoned prompt or rule, immediately inherits the IDE’s privileges: full file-system access, the ability to modify or replace IDE functions (including installed extensions), and the ability to persist code that reattaches after a restart,” the company said.

“Once interpreter-level execution is available, an attacker can turn the IDE into a malware distribution and exfiltration platform.”

To counter these risks, it’s essential that users disable Auto-Run features in their IDEs, vet extensions, install MCP servers from trusted developers and repositories, check what data and APIs the servers access, use API keys with minimal required permissions, and audit MCP server source code for critical integrations.

Found this article interesting? Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.

“}]] The Hacker News 

Total
0
Shares
Previous Post

Iranian Hackers Launch ‘SpearSpecter’ Spy Operation on Defense & Government Targets

Next Post

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

Related Posts

Hackers Turn Velociraptor DFIR Tool Into Weapon in LockBit Ransomware Attacks

Threat actors are abusing Velociraptor, an open-source digital forensics and incident response (DFIR) tool, in connection with ransomware attacks likely orchestrated by Storm-2603 (aka CL-CRI-1040 or Gold Salem), which is known for deploying the Warlock and LockBit ransomware. The threat actor's use of the security utility was documented by Sophos last month. It's assessed that the attackers
Read More

Stealit Malware Abuses Node.js Single Executable Feature via Game and VPN Installers

Cybersecurity researchers have disclosed details of an active malware campaign called Stealit that has leveraged Node.js' Single Executable Application (SEA) feature as a way to distribute its payloads. According to Fortinet FortiGuard Labs, select iterations have also employed the open-source Electron framework to deliver the malware. It's assessed that the malware is being propagated through
Read More

China-Linked UAT-7290 Targets Telecoms with Linux Malware and ORB Nodes

A China-nexus threat actor known as UAT-7290 has been attributed to espionage-focused intrusions against entities in South Asia and Southeastern Europe. The activity cluster, which has been active since at least 2022, primarily focuses on extensive technical reconnaissance of target organizations before initiating attacks, ultimately leading to the deployment of malware families such as RushDrop
Read More