For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
This step-by-step guide covers both interactive and YAML-based deployment options.
We will be using Magemaker, a Python tool that simplifies deploying open-source AI models to cloud providers like AWS, GCP, and Azure.
Prerequisites
Before starting, ensure you have:
Python 3.11+
AWS account with SageMaker access
Appropriate quotas for ml.g5.2xlarge instances (you get 2 pre-approved quota instances of these with AWS)
AWS Setup
First, set up your AWS credentials:
Go to AWS IAM
Create a new user or select existing one
Attach these policies:
AmazonSagemakerFullAccess
IAMFullAccess
ServiceQuotasFullAccess
Create access key for CLI use
Save both Access Key and Secret Access Key
If you still face config issues, follow the steps in this section Cloud Config.
Option 1: Interactive Deployment
When prompted:
Select "Deploy a model endpoint"
Choose "Deploy a Hugging Face model"
Enter: "deepseek-ai/deepseek-r1-distill-llama-8b"
For instance type, enter: "ml.g5.2xlarge"
Option 2: YAML Configuration (Recommended)
Create `deploy-deepseek.yaml`:
Deploy using:
If you get file not found error, include the full path to the yaml file.
Querying the Model
Once deployed, query your endpoint from anywhere:
Using curl
Using Python
Important Notes
1. Instance Selection
ml.g5.2xlarge provides 1 NVIDIA A10G GPU
24 GiB GPU memory
8 vCPUs
32 GiB system memory
2. Costs
g5.2xlarge instances cost approximately $1.21/hour
Remember to delete endpoints when not in use:
3. Common Issues
If you get quota errors, request an increase for g5.2xlarge instances
Ensure your IAM user has sufficient permissions
Check that your region supports g5 instances
Monitoring
Monitor your endpoint through:
AWS SageMaker Console
CloudWatch metrics
SageMaker logs
Next Steps
Once deployed, consider:
Setting up auto-scaling
Implementing proper error handling
Adding monitoring and alerting
Testing different prompt templates
Load testing your endpoint
Remember to always monitor your AWS costs and shut down unused endpoints!
Troubleshooting
If you encounter issues:
Verify AWS credentials are properly configured
Check SageMaker service quotas
Ensure sufficient IAM permissions
Review CloudWatch logs for detailed error messages
Best Practices
Use YAML configuration for reproducible deployments
Implement proper error handling in your client code
Set up monitoring and alerts
Keep track of costs
Version control your deployment configurations
We are open-sourcing Magemaker next week!!! Stay tuned!!!!
As always, Happy coding! 🚀
if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.
Deploy any model In Your Private Cloud or SlashML Cloud
READ OTHER POSTS