Tools

AI Gallery

Tools

AI Gallery

Tools

AI Gallery

Jneid Jneid

@jjneid94

Published on Jan 28, 2025

Host Deepseek R1 Distilled Llama-8b on AWS Sagemaker

Jneid Jneid

@jjneid94

Published on Jan 28, 2025

For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Llama-8B.

This step-by-step guide covers both interactive and YAML-based deployment options.

We will be using Magemaker, a Python tool that simplifies deploying open-source AI models to cloud providers like AWS, GCP, and Azure.

Prerequisites

Before starting, ensure you have:

  1. Python 3.11+

  2. AWS account with SageMaker access

  3. Appropriate quotas for ml.g5.2xlarge instances (you get 2 pre-approved quota instances of these with AWS)

AWS Setup

First, set up your AWS credentials:

  1. Go to AWS IAM

  2. Create a new user or select existing one

  3. Attach these policies:

    • AmazonSagemakerFullAccess

    • IAMFullAccess

    • ServiceQuotasFullAccess

  4. Create access key for CLI use

  5. Save both Access Key and Secret Access Key

If you still face config issues, follow the steps in this section Cloud Config.

Option 1: Interactive Deployment

# Install Magemaker
pip install magemaker
# Start deployment
magemaker --cloud aws

When prompted:

  1. Select "Deploy a model endpoint"

  2. Choose "Deploy a Hugging Face model"

  3. Enter: "deepseek-ai/deepseek-r1-distill-llama-8b"

  4. For instance type, enter: "ml.g5.2xlarge"

Option 2: YAML Configuration (Recommended)

Create `deploy-deepseek.yaml`:

deployment: !Deployment
  destination: aws
  endpoint_name: deepseek-llama
  instance_count: 1
  instance_type: ml.g5.2xlarge
  num_gpus: null
  quantization: null
models:
  - !Model
    id: deepseek-ai/deepseek-r1-distill-llama-8b
    location: null
    predict: null
    source: huggingface
    task: text-generation
    version: null

Deploy using:

magemaker --deploy deploy-deepseek.yaml

If you get file not found error, include the full path to the yaml file.

Querying the Model

Once deployed, query your endpoint from anywhere:

Using curl

curl -X POST \
  https://runtime.sagemaker.[REGION].amazonaws.com/endpoints/deepseek-coder-r1/invocations \
  -H "Content-Type: application/json" \
  -H "Authorization: AWS4-HMAC-SHA256 ..." \
  -d '{
    "inputs": "Write a Python function to sort a dictionary by values"
  }'

Using Python

import boto3
runtime = boto3.client('sagemaker-runtime')
response = runtime.invoke_endpoint(
    EndpointName='deepseek-llama',
    ContentType='application/json',
    Body='{"inputs": "Write a Python function to sort a dictionary by values"}'
)
print(response['Body'].read().decode())

Important Notes

1. Instance Selection

  • ml.g5.2xlarge provides 1 NVIDIA A10G GPU

  • 24 GiB GPU memory

  • 8 vCPUs

  • 32 GiB system memory

2. Costs

  • g5.2xlarge instances cost approximately $1.21/hour

  • Remember to delete endpoints when not in use:

     magemaker --cloud aws
     # Select "Delete a model endpoint"

3. Common Issues

  • If you get quota errors, request an increase for g5.2xlarge instances

  • Ensure your IAM user has sufficient permissions

  • Check that your region supports g5 instances

Monitoring

Monitor your endpoint through:

  1. AWS SageMaker Console

  2. CloudWatch metrics

  3. SageMaker logs

Next Steps

Once deployed, consider:

  1. Setting up auto-scaling

  2. Implementing proper error handling

  3. Adding monitoring and alerting

  4. Testing different prompt templates

  5. Load testing your endpoint

Remember to always monitor your AWS costs and shut down unused endpoints!

Troubleshooting

If you encounter issues:

  1. Verify AWS credentials are properly configured

  2. Check SageMaker service quotas

  3. Ensure sufficient IAM permissions

  4. Review CloudWatch logs for detailed error messages

Best Practices

  1. Use YAML configuration for reproducible deployments

  2. Implement proper error handling in your client code

  3. Set up monitoring and alerts

  4. Keep track of costs

  5. Version control your deployment configurations

We are open-sourcing Magemaker next week!!! Stay tuned!!!!

As always, Happy coding! 🚀


if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.

If you are self-hosting, try out our dashboard

If you are self-hosting, try out our dashboard

Deploy any model In Your Private Cloud or SlashML Cloud

READ OTHER POSTS

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal