SIGforum.com    Main Page  Hop To Forum Categories  The Lounge    AI models and the absurd machines that run them
Go
New
Find
Notify
Tools
Reply
  
AI models and the absurd machines that run them Login/Join 
Member
Picture of maladat
posted
There have been a number of threads recently about the generative AI stuff getting a lot of media attention lately.

There has been some confusion and discussion regarding how the models work and what sort of systems they run on. I thought I might offer some information for the tech-interested of the forum.

All of these models are types of neural networks. In computing, neural networks are a type of algorithm that was very vaguely inspired by biological neural systems, but an AI neural network has virtually nothing in common with a biological neural system and is in NO WAY remotely an emulation of a biological neural system.

You can think of a neural network as a giant flowchart with nodes arranged in layers. Each node does a simple computation on its inputs, and then sends the result to a bunch of nodes in the next layer. There's an input layer at the beginning and an output layer at the end. Because there are so many connections between each layer, it is absolutely critical that the entire model run on a single machine - moving data around inside one machine is fast, moving it between multiple machines is much, much slower and would absolutely cripple the performance of the model.

Taking ChatGPT as an example, the underlying AI model is a text completion model. You feed the input layer a piece of text, and what comes out at the output layer is a continuation of the input text that more or less "looks like" what a human would have written next.

There are two separate things that define a neural network. First, its organization or architecture - how many layers, how many nodes in each layer, which nodes in the next layer each node connects to, and what mathematical function each node performs. The other part is the "parameters" - the specific coefficients of the mathematical functions in each node. Let's say we have a node that takes one input from the last layer, and multiplies it by a constant, and outputs it to the next layer: output = constant * input. The value of the constant is a parameter for that node.

This setup entails a huge number of simple calculations that can be done in parallel (you can do every node in a layer in parallel, they only depend on the last layer) in a similar way that computer graphics does (you can do every pixel independently), so over time, AI algorithms have evolved to run more efficiently on graphics cards, and graphics cards have evolved to more efficiently run AI algorithms.

Now virtually all AI models run on specialized graphics cards.

But how? ChatGPT runs on a version of the GPT-3 model that has 175 billion 16-bit parameters. That means that JUST the parameters, not including the data about the architecture of the network, would take up 350 GB of GPU RAM.

The best gaming graphics card currently available, the NVIDIA RTX 4090, has a $1600 MSRP and only 24 GB of RAM. You'd need 15 of them just to hold the GPT-3 model parameters, and there aren't any computers that can run 15 of them and the RTX 4090 isn't really designed to work in multi-GPU systems anyway.

Well, it turns out that there is a whole separate product category of "graphics cards" that are designed for industrial AI applications. The current top of the heap is the NVIDIA H100, that costs around $30,000 each and has 80 GB of RAM (and VASTLY more processing power than an RTX 4090).

But that still isn't enough to run GPT-3, so what gives? Industrial AI servers are typically multi-GPU systems. An H100-based server can have up to 8 H100 GPUs - so $240,000 worth of "graphics cards" with 640 GB of GPU RAM. These AI servers use a special NVLink bridge between the GPUs that gives each GPU direct access to the RAM of all the other GPUs with 900 GB/s bandwidth (that's not a type - gigaBYTES per second, not gigaBITS per second). Now we can run GPT-3.

Models are going to keep getting bigger, though, and GPT-3 already takes up most of that 640 GB of GPU RAM, so where do we go from there?

That was actually the impetus for writing this post - seeing NVIDIA's announcement about their next-gen system. The GH200 is a combined CPU/GPU "superchip." The CPU has 480 GB of RAM and the GPU has 96 GB of RAM. They can still be used in groups of up to 8 (in total, almost 4 TB of CPU RAM and 768 GB of GPU RAM) with 900 GB/s bandwidth direct memory access between GPUs.

CPUs aside, that isn't THAT big a bump... except that NVIDIA has developed a second layer of NVLINK that extends the 900 GB/s direct memory access between GPUs to up to THIRTY-TWO 8-GPU groups. That's 256 of the chips, all in one machine, all with 900 GB/s direct memory access to each other.

That's effectively ONE SERVER with 256 CPUs with 122 TB of RAM and 256 GPUs with 24 TB of VRAM (literally 1,000 times more VRAM than an RTX 4090). Based on the prices of current-gen chips, the 256-chip machine would probably run you in the neighborhood of $15,000,000.
 
Posts: 6319 | Location: CA | Registered: January 24, 2011Reply With QuoteReport This Post
Member
posted Hide Post
Interesting post. When the AI stuff was staying on page 1 I did some research on what it took to stand up a large scale system. Insane amounts of hardware, software, and data. ChatGpt III's data sets are 2-3 years old? ChatGpt IV's training sets were even larger with the addition of optical and aural training. I knew NVidia was a player in this space but I didn't realize until how big until this thread. All you've posted about hardware (and more) will be required to keep the training sets current and more real time. Something I hadn't really thought about until now.
 
Posts: 7570 | Registered: October 31, 2008Reply With QuoteReport This Post
Alea iacta est
Picture of Beancooker
posted Hide Post
Very interesting. Long ago I built and overclocked computers. It’s pretty incredible how the hardware has developed.

I find it interesting that the graphics cards have become one of the more important pieces in computing. When I was building them, they were merely graphics cards.



quote:
Originally posted by parabellum: You must have your pants custom tailored to fit your massive balls.
The “lol” thread
 
Posts: 4031 | Location: Staring down at you with disdain, from the spooky mountaintop castle.  | Registered: November 20, 2010Reply With QuoteReport This Post
Member
Picture of 229DAK
posted Hide Post
quote:
That's effectively ONE SERVER with 256 CPUs with 122 TB of RAM and 256 GPUs with 24 TB of VRAM (literally 1,000 times more VRAM than an RTX 4090). Based on the prices of current-gen chips, the 256-chip machine would probably run you in the neighborhood of $15,000,000.
Gotta wonder how much power it takes to run all this?


_________________________________________________________________________
“A man’s treatment of a dog is no indication of the man’s nature, but his treatment of a cat is. It is the crucial test. None but the humane treat a cat well.”
-- Mark Twain, 1902
 
Posts: 9058 | Location: Northern Virginia | Registered: November 04, 2005Reply With QuoteReport This Post
Animis Opibusque Parati
posted Hide Post
quote:
Originally posted by 229DAK:
quote:
That's effectively ONE SERVER with 256 CPUs with 122 TB of RAM and 256 GPUs with 24 TB of VRAM (literally 1,000 times more VRAM than an RTX 4090). Based on the prices of current-gen chips, the 256-chip machine would probably run you in the neighborhood of $15,000,000.
Gotta wonder show much power it takes to run all this?


I took a stab at what ChatGPT says if asked the question about the power requirements for the info above: "Total Power Consumption:
Adding up the power requirements for the CPUs, RAM, and GPUs, we get an estimated total power consumption of around 177.2 kilowatts (38.4 kW + 62 kW + 76.8 kW). Keep in mind that this is a rough estimate, and the actual power requirements may vary depending on the specific hardware and usage patterns."




"Prepared in mind and resources"
 
Posts: 1353 | Location: SC | Registered: October 28, 2011Reply With QuoteReport This Post
Member
posted Hide Post
Interesting. I took Neural Networks in EE grad school in the early 90’s. Your description of NN is what I recall learning in general. The processing requirement was and still is not up to the challenge for any prime time use on a large scale.
 
Posts: 3956 | Location: UNK | Registered: October 04, 2009Reply With QuoteReport This Post
Member
Picture of maladat
posted Hide Post
The GH200 "superchip" is rated at 1000W (that includes CPU, CPU RAM, GPU, and GPU VRAM).

So without any of the supporting hardware, just the chips on a full 256-chip machine would max out at 256kW.
 
Posts: 6319 | Location: CA | Registered: January 24, 2011Reply With QuoteReport This Post
Ignored facts
still exist
posted Hide Post
quote:
Originally posted by maladat:
The GH200 "superchip" is rated at 1000W (that includes CPU, CPU RAM, GPU, and GPU VRAM).

So without any of the supporting hardware, just the chips on a full 256-chip machine would max out at 256kW.


Which is enough power for 100 to 150 houses using normal planning numbers. Eek Eek


----------------------
Let's Go Brandon!
 
Posts: 10946 | Location: 45 miles from the Pacific Ocean | Registered: February 28, 2003Reply With QuoteReport This Post
Member
Picture of maladat
posted Hide Post
quote:
Originally posted by Jimineer:
Interesting. I took Neural Networks in EE grad school in the early 90’s. Your description of NN is what I recall learning in general. The processing requirement was and still is not up to the challenge for any prime time use on a large scale.


It depends on your definitions. If we are talking about language models like ChatGPT, then I pretty much agree with you. With current language model performance, the amount of computing power required to run a model that gives pretty good responses in a reasonable amount of time really prohibits widespread, large-scale use - at least in the sense of everybody having full-time personal AI assistants.

There are some lower-density use cases like customer service chat bots that will be in production use very soon if they aren't already. There are some surprisingly effective techniques for taking a good general-purpose language model that required a VAST amount of training to produce (hundreds of thousands or millions of dollars of GPU time), and then performing a very small amount of training (hundreds, maybe thousands of dollars of GPU time) to "fine tune" it to perform a specific task with a specific knowledge base like that.

Obviously if you do a high-volume, large-scale contract or build your own data center the prices would be lower, but if Joe Schmo goes to one of the big cloud server providers, time on a multi-GPU server that can run one instance of ChatGPT gets billed at around $20-30 PER HOUR ($15-20k per month).

However, smaller, less complex neural networks definitely ARE in widespread, mainstream use in many different areas.

It's not just weird behind-the-scenes industry stuff, either. A couple of big public-facing examples:

The Tesla full-self-driving software uses 48 separate neural networks (a bunch that process data from individual sensors, some that aggregate those outputs to generate a model of what's happening around the car, some to decide what the car will do next, etc).

Starting with the iPhone X in 2017, iPhones have had hardware designed to efficiently run neural networks for speech and image processing (for stuff like Siri voice recognition, face unlock, facial recognition in photos, etc). Android phones have been using neural networks for the same kind of stuff for a while, too.
 
Posts: 6319 | Location: CA | Registered: January 24, 2011Reply With QuoteReport This Post
Ammoholic
Picture of Skins2881
posted Hide Post
Very strange... I just finished reading this article about this same new super computer.



Jesse

Sic Semper Tyrannis
 
Posts: 20851 | Location: Loudoun County, Virginia | Registered: December 27, 2014Reply With QuoteReport This Post
Ammoholic
Picture of Skins2881
posted Hide Post
quote:
Originally posted by Minnow:
quote:
Originally posted by 229DAK:
quote:
That's effectively ONE SERVER with 256 CPUs with 122 TB of RAM and 256 GPUs with 24 TB of VRAM (literally 1,000 times more VRAM than an RTX 4090). Based on the prices of current-gen chips, the 256-chip machine would probably run you in the neighborhood of $15,000,000.
Gotta wonder show much power it takes to run all this?


I took a stab at what ChatGPT says if asked the question about the power requirements for the info above: "Total Power Consumption:
Adding up the power requirements for the CPUs, RAM, and GPUs, we get an estimated total power consumption of around 177.2 kilowatts (38.4 kW + 62 kW + 76.8 kW). Keep in mind that this is a rough estimate, and the actual power requirements may vary depending on the specific hardware and usage patterns."


See the article link I just posted up to 326kW in only 16 racks. Pretty damn dense.



Jesse

Sic Semper Tyrannis
 
Posts: 20851 | Location: Loudoun County, Virginia | Registered: December 27, 2014Reply With QuoteReport This Post
Do the next
right thing
Picture of bobtheelf
posted Hide Post
Pretty amazing that a wet lump of flesh weighing just a few pounds does so much.
 
Posts: 3666 | Location: Nashville | Registered: July 23, 2012Reply With QuoteReport This Post
Ignored facts
still exist
posted Hide Post
quote:
Originally posted by bobtheelf:
Pretty amazing that a wet lump of flesh weighing just a few pounds does so much.


that's a darn good point. Maybe Silicon isn't the right approach Smile Smile Smile


----------------------
Let's Go Brandon!
 
Posts: 10946 | Location: 45 miles from the Pacific Ocean | Registered: February 28, 2003Reply With QuoteReport This Post
Member
Picture of IntrepidTraveler
posted Hide Post
I read somewhere a while back that the human brain runs on about 25 watts of power. A pretty dim bulb by comparison.




Thus the metric system did not really catch on in the States, unless you count the increasing popularity of the nine-millimeter bullet.
- Dave Barry

"Never go through life saying 'I should have'..." - quote from the 9/11 Boatlift Story (thanks, sdy for posting it)
 
Posts: 3302 | Location: Carlsbad NM/ Augusta GA | Registered: July 15, 2007Reply With QuoteReport This Post
Ammoholic
Picture of Skins2881
posted Hide Post
Holy cow! Moore's law be damned.




Jesse

Sic Semper Tyrannis
 
Posts: 20851 | Location: Loudoun County, Virginia | Registered: December 27, 2014Reply With QuoteReport This Post
Get my pies
outta the oven!

Picture of PASig
posted Hide Post
...We have only bits and pieces of information but what we know for certain is that at some point in the early twenty-first century all of mankind was united in celebration. We marveled at our own magnificence as we gave birth to AI.

AI? You mean artificial intelligence?

A singular consciousness that spawned an entire race of machines. We don’t know who struck first, us or them. But we know that it was us that scorched the sky...


 
Posts: 33901 | Location: Pennsylvania | Registered: November 12, 2007Reply With QuoteReport This Post
  Powered by Social Strata  
 

SIGforum.com    Main Page  Hop To Forum Categories  The Lounge    AI models and the absurd machines that run them

© SIGforum 2024