Efficiently Serving LLMs
Exploring techniques such as vectorization, KV caching, continious batching, and LoRA.
Generative AI with LLMs
Exploring the details behind large language models including how LLMs work, and the best practices behind training, tuning and deploying them.
Calculate LLMs GPU Requirements
How much vRAM do you actually need?