Host Deepseek R1 Distilled Llama-8b on AWS Sagemaker

Tools

AI Gallery

Blog

/ML Studio

Tools

AI Gallery

Blog

/ML Studio

Tools

AI Gallery

Blog

/ML Studio

Host Deepseek R1 Distilled Llama-8b on AWS Sagemaker

Jneid Jneid

@jjneid94

Published on Jan 28, 2025

Host Deepseek R1 Distilled Llama-8b on AWS Sagemaker

Jneid Jneid

@jjneid94

Published on Jan 28, 2025

For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Llama-8B.

This step-by-step guide covers both interactive and YAML-based deployment options.

We will be using Magemaker, a Python tool that simplifies deploying open-source AI models to cloud providers like AWS, GCP, and Azure.

Prerequisites

Before starting, ensure you have:

Python 3.11+
AWS account with SageMaker access
Appropriate quotas for ml.g5.2xlarge instances (you get 2 pre-approved quota instances of these with AWS)

AWS Setup

First, set up your AWS credentials:

Go to AWS IAM
Create a new user or select existing one
Attach these policies:
- AmazonSagemakerFullAccess
- IAMFullAccess
- ServiceQuotasFullAccess
Create access key for CLI use
Save both Access Key and Secret Access Key

If you still face config issues, follow the steps in this section Cloud Config.

Option 1: Interactive Deployment

# Install Magemaker
pip install magemaker

# Start deployment
magemaker --cloud aws

When prompted:

Select "Deploy a model endpoint"
Choose "Deploy a Hugging Face model"
Enter: "deepseek-ai/deepseek-r1-distill-llama-8b"
For instance type, enter: "ml.g5.2xlarge"

Option 2: YAML Configuration (Recommended)

Create `deploy-deepseek.yaml`:

deployment: !Deployment
  destination: aws
  endpoint_name: deepseek-llama
  instance_count: 1
  instance_type: ml.g5.2xlarge
  num_gpus: null
  quantization: null
models:
  - !Model
    id: deepseek-ai/deepseek-r1-distill-llama-8b
    location: null
    predict: null
    source: huggingface
    task: text-generation
    version: null

Deploy using:

magemaker --deploy deploy-deepseek.yaml

If you get file not found error, include the full path to the yaml file.

Querying the Model

Once deployed, query your endpoint from anywhere:

Using curl

curl -X POST \
  https://runtime.sagemaker.[REGION].amazonaws.com/endpoints/deepseek-coder-r1/invocations \
  -H "Content-Type: application/json" \
  -H "Authorization: AWS4-HMAC-SHA256 ..." \
  -d '{
    "inputs": "Write a Python function to sort a dictionary by values"
  }'

Using Python

import boto3
runtime = boto3.client('sagemaker-runtime')
response = runtime.invoke_endpoint(
    EndpointName='deepseek-llama',
    ContentType='application/json',
    Body='{"inputs": "Write a Python function to sort a dictionary by values"}'
)
print(response['Body'].read().decode())

Important Notes

1. Instance Selection

ml.g5.2xlarge provides 1 NVIDIA A10G GPU
24 GiB GPU memory
8 vCPUs
32 GiB system memory

2. Costs

g5.2xlarge instances cost approximately $1.21/hour
Remember to delete endpoints when not in use:

     magemaker --cloud aws
     # Select "Delete a model endpoint"

3. Common Issues

If you get quota errors, request an increase for g5.2xlarge instances
Ensure your IAM user has sufficient permissions
Check that your region supports g5 instances

Monitoring

Monitor your endpoint through:

AWS SageMaker Console
CloudWatch metrics
SageMaker logs

Next Steps

Once deployed, consider:

Setting up auto-scaling
Implementing proper error handling
Adding monitoring and alerting
Testing different prompt templates
Load testing your endpoint

Remember to always monitor your AWS costs and shut down unused endpoints!

Troubleshooting

If you encounter issues:

Verify AWS credentials are properly configured
Check SageMaker service quotas
Ensure sufficient IAM permissions
Review CloudWatch logs for detailed error messages

Best Practices

Use YAML configuration for reproducible deployments
Implement proper error handling in your client code
Set up monitoring and alerts
Keep track of costs
Version control your deployment configurations

We are open-sourcing Magemaker next week!!! Stay tuned!!!!

As always, Happy coding! 🚀

if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.

If you want to chat with Any github Codebase, please visit CodesALot

If you want to chat with data and generate visualizations, please visit SirPlotsAlot

If you are self-hosting, try out our dashboard

Deploy any model In Your Private Cloud or SlashML Cloud

Deploy now!

READ OTHER POSTS