Jneid Jneid

@jjneid94

Published on Jan 29, 2025

Host Deepseek R1 Distilled Llama-8b on GCP

Jneid Jneid

@jjneid94

Published on Jan 29, 2025

For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

This is a step-by-step guide using both interactive and YAML-based deployment options using Magemaker.

Step 1: GCP Setup

1. Create a Google Cloud account if you haven't already

2. Install gcloud CLI:

   # Follow instructions at cloud.google.com/sdk/docs/install-sdk
   gcloud init

3. Enable Vertex AI API in your project:

  • Go to Google Cloud Console

  • Search for "Vertex AI API"

  • Click "Enable"

For a more step by step config, use this.

Step 2: Authentication

# Login and set application default credentials
gcloud auth application-default login
# Verify your configuration
gcloud config list

Step 3: Create YAML Configuration

Create a file named `deploy-deepseek-gcp.yaml`:

deployment: !Deployment
  destination: gcp
  endpoint_name: deepseek-r1-distill
  accelerator_count: 1
  instance_type: g2-standard-12
  accelerator_type: NVIDIA_L4
  num_gpus: null
  quantization: null
models:
- !Model
  id: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  location: null
  predict: null
  source: huggingface
  task: text-generation
  version: null

Step 4: Deploy

# Install Magemaker if you haven't already
pip install magemaker
# Deploy using the YAML file
magemaker --deploy deploy-deepseek-gcp.yam

Step 5: Verify Deployment

  1. Go to Google Cloud Console

  2. Navigate to Vertex AI → Model Registry

  3. Check your endpoint status

Step 6: Test the Endpoint

Use this Python code to test your deployment:

from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(
    endpoint_name="projects/{project}/locations/{location}/endpoints/{endpoint_id}"
)
response = endpoint.predict(
    instances=[{
        "inputs": "Write a Python function to calculate fibonacci numbers"
    }]
)
print(response)

Common Issues and Solutions

Quota Issues

If you encounter quota errors:

  1. Go to IAM & Admin → Quotas

  2. Search for "NVIDIA L4 GPUs"

  3. Request quota increase

Authentication Issues

# Verify your credentials
gcloud auth list
# Reset if needed
gcloud auth login
gcloud auth application-default login


Instance Availability

  • Check if g2-standard-12 is available in your region

  • Try different regions if needed

Monitoring Your Deployment

Monitor through Google Cloud Console:

  1. Vertex AI → Endpoints

  2. Cloud Monitoring

  3. Cloud Logging

Cost Management

Pricing Breakdown

  • g2-standard-12 with NVIDIA L4: ~$1 per hour

  • Additional costs:

    • Network egress

    • API calls

    • Storage for model artifacts

Cost Optimization Tips

1. Delete endpoints when not in use:

   magemaker --cloud gcp
   # Select "Delete a model endpoint"

2. Use batch processing when possible

3. Monitor usage patterns

4. Set up billing alerts

5. Consider scheduled shutdowns for non-critical workloads

Monthly Cost Estimates

  • 24/7 running: ~$720/month

  • 8 hours/day: ~$240/month

  • 4 hours/day: ~$120/month

Next Steps

  1. Set up monitoring alerts

  2. Configure auto-scaling if needed

  3. Implement proper error handling

  4. Test with different prompts



We are open-sourcong Magemaker!! Stay Tuned!!!

As Always, Happy Coding!!!


if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.

Try out our dashboard

Try out our dashboard

Deploy any model In Your Private Cloud or SlashML Cloud

READ OTHER POSTS

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal