
Thu Sep 26 13:00:00 UTC 2024: ## Cloudflare Speeds Up Workers AI with New Hardware and Software Optimizations
**San Francisco, CA – [Date]** – Cloudflare, the leading global network for the Internet, today announced significant performance upgrades for its Workers AI platform. These enhancements deliver faster inference speeds for customers utilizing large language models (LLMs), enhancing user experiences for interactive chat, agents, and content generation.
The improvements come in the form of three key upgrades:
* **Upgraded Hardware:** Cloudflare’s network now leverages 12th generation compute servers with advanced GPUs capable of handling larger models and achieving faster inference. This allows customers to utilize powerful models like Meta Llama 3.2 11B and Llama 3.1 70B, experiencing up to 2-3 times faster throughput compared to previous hardware.
* **KV Cache Compression:** Cloudflare has developed a novel open-source technique for compressing KV cache, which significantly increases the efficiency of LLM inference. This technique, dubbed “PagedAttention,” effectively reduces memory usage by up to 64 times, allowing for a larger number of tokens to be processed concurrently and boosting overall throughput by up to 5.18 times.
* **Speculative Decoding:** Introducing prompt-lookup decoding, Workers AI now predicts multiple tokens simultaneously, speeding up inference by up to 70% for some models. This intelligent approach leverages patterns in the generated text to make more accurate predictions, improving the speed without significantly compromising output quality.
“We are dedicated to providing the fastest and most efficient inference platform for our customers,” said [Cloudflare Executive Name], [Title]. “These new upgrades demonstrate our commitment to delivering continuous innovation and empowering developers with cutting-edge AI capabilities.”
Cloudflare’s efforts to accelerate Workers AI underscore its commitment to delivering a robust Free service tier and offering a platform for running containers across its network. The company continues to invest in developing innovative AI-powered features, such as WAF rule generation, bot traffic insights, and bot blocking capabilities.
With these advancements, Cloudflare continues to solidify its position as a leader in the AI-driven Internet, providing developers and businesses with a powerful and accessible platform to leverage the power of LLMs.