Tools

AI Gallery

Tools

AI Gallery

Tools

AI Gallery

Jneid Jneid

@jjneid94

Published on Jan 29, 2025

Host Deepseek R1 Distilled Llama-8b on GCP Vertex AI

Jneid Jneid

@jjneid94

Published on Jan 29, 2025

For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

This step-by-step guide covers both interactive and YAML-based deployment options.

We will be using Magemaker, a Python tool that simplifies deploying open-source AI models to cloud providers like AWS, GCP, and Azure.

Step 1: GCP Setup

1. Create a Google Cloud account if you haven't already

2. Install gcloud CLI:

   # Follow instructions at cloud.google.com/sdk/docs/install-sdk
   gcloud init

3. Enable Vertex AI API in your project:

  • Go to Google Cloud Console

  • Search for "Vertex AI API"

  • Click "Enable"

For a more step by step config, use this.

Step 2: Authentication

# Login and set application default credentials
gcloud auth application-default login
# Verify your configuration
gcloud config list

Step 3: Create YAML Configuration

Create a file named `deploy-deepseek-gcp.yaml`:

deployment: !Deployment
  destination: gcp
  endpoint_name: deepseek-r1-distill
  accelerator_count: 1
  instance_type: g2-standard-12
  accelerator_type: NVIDIA_L4
  num_gpus: null
  quantization: null
models:
- !Model
  id: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  location: null
  predict: null
  source: huggingface
  task: text-generation
  version: null

Step 4: Deploy

# Install Magemaker if you haven't already
pip install magemaker
# Deploy using the YAML file
magemaker --deploy deploy-deepseek-gcp.yam

Step 5: Verify Deployment

  1. Go to Google Cloud Console

  2. Navigate to Vertex AI → Model Registry

  3. Check your endpoint status

Step 6: Test the Endpoint

Use this Python code to test your deployment:

from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(
    endpoint_name="projects/{project}/locations/{location}/endpoints/{endpoint_id}"
)
response = endpoint.predict(
    instances=[{
        "inputs": "Write a Python function to calculate fibonacci numbers"
    }]
)
print(response)

Common Issues and Solutions

Quota Issues

If you encounter quota errors:

  1. Go to IAM & Admin → Quotas

  2. Search for "NVIDIA L4 GPUs"

  3. Request quota increase

Authentication Issues

# Verify your credentials
gcloud auth list
# Reset if needed
gcloud auth login
gcloud auth application-default login


Instance Availability

  • Check if g2-standard-12 is available in your region

  • Try different regions if needed

Monitoring Your Deployment

Monitor through Google Cloud Console:

  1. Vertex AI → Endpoints

  2. Cloud Monitoring

  3. Cloud Logging

Cost Management

Pricing Breakdown

  • g2-standard-12 with NVIDIA L4: ~$1 per hour

  • Additional costs:

    • Network egress

    • API calls

    • Storage for model artifacts

Cost Optimization Tips

1. Delete endpoints when not in use:

   magemaker --cloud gcp
   # Select "Delete a model endpoint"

2. Use batch processing when possible

3. Monitor usage patterns

4. Set up billing alerts

5. Consider scheduled shutdowns for non-critical workloads

Monthly Cost Estimates

  • 24/7 running: ~$720/month

  • 8 hours/day: ~$240/month

  • 4 hours/day: ~$120/month

Next Steps

  1. Set up monitoring alerts

  2. Configure auto-scaling if needed

  3. Implement proper error handling

  4. Test with different prompts



We are open-sourcong Magemaker!! Stay Tuned!!!

As Always, Happy Coding!!!


if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.


If you want to chat with Any github Codebase, please visit CodesALot

If you want to chat with data and generate visualizations, please visit SirPlotsAlot

If you are struggling to integrate AI into your apps

If you are struggling to integrate AI into your apps

Build AI enbaled apps from a single prompt

Build full stack Apps from prompts

READ OTHER POSTS

©2025 – Made with ❤️ & ☕️ in Montreal

©2025 – Made with ❤️ & ☕️ in Montreal

©2025 – Made with ❤️ & ☕️ in Montreal