OpenAI and Broadcom Unveil Jalapeño, a Custom Chip for LLM Inference

OpenAI and Broadcom have introduced Jalapeño, a custom AI accelerator built specifically for large language model inference. It is OpenAI’s first in-house Intelligence Processor and marks a notable step in the company’s effort to control more of the infrastructure behind its AI products.

What Jalapeño Does

The chip was designed from the ground up for inference, rather than being adapted from an existing accelerator. OpenAI says engineering samples are already running machine learning workloads in the lab, including GPT-5.3-Codex-Spark, and early tests suggest the chip delivers strong performance per watt.

That is important because inference is where AI systems spend a large share of their real-world compute. A chip tailored to that task can improve response speed, lower operating costs and make large deployments more efficient.

Platform Strategy

Jalapeño is the first product in a multi-generation compute platform being developed by OpenAI and Broadcom. The companies plan to begin gigawatt-scale data centre deployments in 2026, with Microsoft among the partners involved.

OpenAI said the chip was designed using insights from its model roadmap, serving systems and product requirements. Broadcom handled silicon implementation and networking technologies, while Celestica worked on board, rack and system integration.

Why It Matters

The announcement shows how leading AI companies are trying to reduce dependence on third-party accelerators. By designing its own hardware, OpenAI can optimize more of the stack, including models, software, networking and infrastructure, instead of adapting its products to general-purpose chips.

OpenAI also said the chip moved from initial design to tape-out in nine months, which it described as one of the fastest ASIC development cycles in advanced semiconductor development. The company added that its own AI models helped accelerate parts of the design and optimization process.

Industry Impact

If Jalapeño performs as expected, it could improve ChatGPT responsiveness, lower API costs and make access more reliable during periods of high demand. More broadly, it reflects a shift in the AI industry: companies are no longer just building models, but also the compute layer underneath them.

Latest articles

Related articles