Calculate LLMs GPU Requirements
Inference
Intially let’s start with the memory required for 1 parameter which is 4 bytes given FP32 precision.
1 Parameter (Weights) = 4 Bytes (FP32)
To calculate the memory required for 1 billion parameter we will multiply 4 bytes with a billion which would give us around 4 GB.
1 Billion Parameter = 4 * 10^9 = 4 GB
The following table shows the memory requirements for different model precisions per 1 billion parameters.
Full Precision (32bits) | Half Precision (16bits) | 8bits | |
---|---|---|---|
1 Billion Parameter | 4 GB | 2 GB | 1 GB |
Accordingly now you can multiply the vRAM number in the table above with the number of billion parameters in the model based on it’s precision. The table below shows the minimum memory requirements to load the model for inference without accounting for the memory required for the hits on the model.
Model Name | Full Precision (32bits) | Half Precision (16bits) | 8bits |
---|---|---|---|
Falcon (7B) | 28 GB | 14 GB | 7 GB |
Llama2 (7B) | 28 GB | 14 GB | 7 GB |
Jais (13B) | 52 GB | 26 GB | 13 GB |
Jais (30B) | 120 GB | 60 GB | 30 GB |
Falcon (40B) | 160 GB | 80 GB | 40 GB |
To determine how much more you need is based on your system requirements such as concurrent user queries, caching as so on. I believe stress testing is required.
Finetuning
To fine tune a model we will need to load all the following into memory which means we will need X6 the minimum memory requirements for inference. The following shows the memory required for a full-precision model per 1 billion parameters.
Model Component | Full Precision Memory |
---|---|
Model Weights | 4 GB |
Optimizer States | 8 GB |
Gradients | 4 GB |
Activations | 8 GB |
Total | 24 GB |
However, this makes fine-tuning very large models infeasible using full precision. It is recommended to use mixed precision either half precision or 8 bit precision while fine-tuning.
The following table shows the differences between the minimum memory requirements for different precision types.
Model Name | Full Precision (32bits) | Half Precision (16bits) | 8bits | 4bits |
---|---|---|---|---|
Falcon (7B) | 168 GB | 84 GB | 42 GB | 21 GB |
Llama2 (7B) | 168 GB | 84 GB | 42 GB | 21 GB |
Jais (13B) | 312 GB | 156 GB | 78 GB | 39 GB |
Jais (30B) | 720 GB | 360 GB | 180 GB | 90 GB |
Falcon (40B) | 960 GB | 480 GB | 240 GB | 120 GB |
Enjoy Reading This Article?
Here are some more articles you might like to read next: