The NewReality: Fast Inference Processing For 90% Less?

📆 6/18/2024 4:42 AM

AI News

Inferencing, Low-Cost, Power Efficient

📆 6/18/2024 4:42 AM
📰 ForbesTech

⏱ Reading Time:
226 sec. here
8 min. at publisher
📊 Quality Score:
News: 104%
Publisher: 59%

I love to learn and share the amazing hardware and services being built to enable Artificial Intelligence, the next big thing in technology.

Most of the investment buzz in AI hardware concentrates on the amazing accelerator chips that crunch the math required for neural networks, like Nvidia’s GPUs. But what about the rest of the story? CPUs and NICs that pre- and post-process the query add significant costs and are not designed for AI; they are general-purpose devices and can cost tens of thousands of dollars per server.

What if someone re-imagined those servers with a clean-sheet design to efficiently handle the AI task at hand? An Israeli startup called NeuReality, led by Moshe Tanach, has done just that, and the results are impressive. Instead of a “CPU-Centric” architecture, the company front-ends each Deep Learning Accelerator with dedicated silicon. NeuReality has justthat describes their approach to ‘’Network Addressable Processing Units ” and has measured the potential performance and cost savings. Instead of trying to compete with Deep Learning chips like those from Nvidia and Qualcomm, NeuReality is taking care of all the other “stuff” needed to feed data to those chips and coordinate clusters of them, and could conceivably support virtually any PCIe-based accelerator.Let’s start with the high-level view of the NeuReality NR1. Instead of the above architecture, where a NIC feeds a CPU which then feeds a PCIe switch which distributes work across a box of accelerators, the NR1 combines those functions at a lower cost and manages the workflow across a series of DLAs.To preprocess the query, the NR1 includes processing units for vision, audio, a DSP and a CPU, along with on- and off-chip memory, security, networking, and management. It also has an “AI-over-fabric” controller and an AI Hypervisor that handles the task of distributing AI work across a network of DLAs.Obviously, software is needed to utilize these various processing units and the DLA, and NewReality says it is ready to roll. One of the more challenging aspects will be the AI Hypervisor and, of course, the compiler that selects the best compute engine for a specific AI workflow.Note that the “CPU” in these diagrams is not a full-fledged CPU, but is rather an 8-core Neoverse Arm complex and is primarily used for management flow and as a compute element of last resort when the compiler cannot determine which compute element is to be used.NeuReality’s blog describes a lot more detail on the architecture, beyond the scope of this blog, but concludes with some pretty astounding benchmarks. First, they measure the performance of a system with 1-10 Qualcomm DLAs, and easily beat a CPU-driven Nvidia L40S system by over 2X. And given the low power and space requirements of the Qualcomm DLA, they can scale to 10 DLA’s per server, nearly tripling the performance of the 8-way L40S.Now, they took a look at costs, and claim a 90% savings vs the DGX-H100 . A more fair comparison is versus the more affordable Nvidia L40S, the 10-way AI100 NeuReality server is 50-67% lower cost,, and twice the performance. Thats is some four-fold better performance per dollar.As for energy efficiency, where the Qualcomm AI 100 Ultra excels, the charts below show that the new platform is not only cost effective, but energt efficient as well.This new approach, eliminating two expensive CPUs and a NIC, represents the very first attempt to redefine the server architecture specifically for AI workloads. Of course the performance and benchmark claims require 3rd party validation, and the company needs to broaden its ecosystem to include server OEMs and/or ODMs. As for the relatively small models used in the benchmarks, NeuReality’s strategic roadmap prioritizes support for small Language Model Models with up to 100 billion parameters, leveraging single-card and single-node configurations. The company’s focus extends to other generative AI pipelines like RAG, Mixture of Experts, and multimodal embedding models.should not be taken as advice to purchase from or invest in the companies mentioned. Cambrian-AI Research is fortunate to have many, if not most, semiconductor firms as our clients, including Blaize, BrainChip, Cadence Design, Cerebras, D-Matrix, Eliyan, Esperanto, GML, Groq, IBM, Intel, NVIDIA, Qualcomm Technologies, Si-Five, SiMa.ai, Synopsys, Ventana Microsystems, Tenstorrent and scores of investment clients. We have no investment positions in any of the companies mentioned in this article and do not plan to initiate any in the near future. For more information, please visit our website atOur community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.Insults, profanity, incoherent, obscene or inflammatory language or threats of any kindContinuous attempts to re-post comments that have been previously moderated/rejectedAttempts or tactics that put the site security at riskProtect your community.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Inferencing Low-Cost Power Efficient

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Vin Diesel Trains At Familiar Fast & Furious Location In New Fast 11 BTS PhotoVin Diesel looking serious as Dom Toretto in Fast X
Read more »

Fast & Furious’ Most Disappointing Movie Makes This $626 Million Hit Even Better In RetrospectDominic Toretto in Fast Five and Fast X
Read more »

8 Lessons Fast 11 Can Learn From Fast & Furious’ Best MovieWhat Fast and Furios 11 Can Learn from Fast 5
Read more »

10 Things That Must Happen In Fast & Furious 11 Before The Fast Saga EndsVin-Diesel-Fast-and-Furious-Franchise
Read more »

Fast & Furious Disrespected Han, But Fast 11 Is About To Do Something Even WorseSung Kang as Han leaned against a car and smiling slightly against and orange background in the Fast & Furious franchise
Read more »

Summer House Star Paige DeSorbo Joins Amazon for New Original Shoppable Show on the Fast ChannelThe podcaster and Bravo star’s weekly show “In Bed with Paige DeSorbo” will utilize Amazon’s shop-the-show technology and feature celebrity guests.
Read more »