- Tubelator AI
- >
- Videos
- >
- Howto & Style
- >
- Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)
Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)
In this video, I demonstrate how to set up and deploy a Llama 3.1 Phi Mistral Gemma 2 model using Olama on an AWS EC2 instance with GPU. Starting from scratch, I guide you through the entire process on AWS, including instance setup, selecting the appropriate AMI, configuring the instance, and setting up the environment with CUDA drivers. We also cover installing Go, cloning a simple Go server, configuring API keys, and securing the server for persistent deployment. By the end, you'll have a functional, customizable setup to run your own AI models efficiently and economically. Steps include selecting the appropriate instance type, setting up SSH, installing dependencies, running Olama, and securing the web service. Whether you're a developer looking to integrate AI or just starting, this tutorial will help you achieve a smooth deployment.
Repo: https://github.com/developersdigest/aws-ec2-cuda-ollama
Ollama: https://ollama.com/
00:00 Introduction to Deploying Llama 3.1 Phi Mistral Gemma 2
00:52 Setting Up Your EC2 Instance
02:25 Configuring Your Instance and Storage
03:28 Connecting to Your Instance via SSH
04:08 Installing Dependencies and Cloning the Repository
05:05 Running the Model and Setting Up the Server
05:58 Configuring Security and Testing the Endpoint
07:33 Ensuring Server Persistence
08:53 Conclusion and Final Thoughts
Video Summary & Chapters
No chapters for this video generated yet.
Video Transcript
in this video I'm going to show you how
you can deploy llama 3.1 fi mistol Gemma
2 all through o llama on a GPU enabled
ec2 instance on AWS I'm going to show
you completely from scratch how to set
this up from AWS and then what we're
going to be leveraging and by the end of
the video you'll have a nice clean go
script so whether you want to add API
Keys within this or if you want to build
on top of it you'll be able to do all of
that I'll just show you quickly how it
will work through our ghost script we're
going to have a really basic open aai
compatible script where we'll be able to
pass in our base URL the model the
messages as well as the Stream So by the
end of the video you'll have a base URL
you'll be able to set up your
authentication with your API key and
then we'll have a simple sort of open AI
compatible schema for how we interact
with our API so just to show you how
this works this is just a really quick
demonstration on how it works without
further Ado let's get into it so to get
started once you're in the console here
is you can search for ec2 in the search
box here if you don't have it on your
homepage from there we're going to go
ahead and click launch instance in this
case we can just call it ama GPU server
or whatever you want really and then
what we're going to do here is we're
just going to browse these Amis so now
what we're going to be searching for is
deep learning now the reason we're using
an Ami is it makes it really easy to set
up all the different Cuda drivers and
all of the things that you need to
leverage that GPU that's connected to
your ec2 instance if we didn't do this
you could still set this all up but
there would be a handful more steps on
having to actually install all of the
different drivers and making sure that's
all set up the nice thing with this is
there's less room for error you can just
search for the Deep learning base it
should be the one at the top but just to