For this tutorial, we are using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.
This is a step-by-step guide using both interactive and YAML-based deployment options using Magemaker.
Step 1: GCP Setup
1. Create a Google Cloud account if you haven't already
2. Install gcloud CLI:
3. Enable Vertex AI API in your project:
Go to Google Cloud Console
Search for "Vertex AI API"
Click "Enable"
For a more step by step config, use this.
Step 2: Authentication
Step 3: Create YAML Configuration
Create a file named `deploy-deepseek-gcp.yaml`:
Step 4: Deploy
Step 5: Verify Deployment
Go to Google Cloud Console
Navigate to Vertex AI → Model Registry
Check your endpoint status
Step 6: Test the Endpoint
Use this Python code to test your deployment:
Common Issues and Solutions
Quota Issues
If you encounter quota errors:
Go to IAM & Admin → Quotas
Search for "NVIDIA L4 GPUs"
Request quota increase
Authentication Issues
Instance Availability
Check if g2-standard-12 is available in your region
Try different regions if needed
Monitoring Your Deployment
Monitor through Google Cloud Console:
Vertex AI → Endpoints
Cloud Monitoring
Cloud Logging
Cost Management
Pricing Breakdown
g2-standard-12 with NVIDIA L4: ~$1 per hour
Additional costs:
Network egress
API calls
Storage for model artifacts
Cost Optimization Tips
1. Delete endpoints when not in use:
2. Use batch processing when possible
3. Monitor usage patterns
4. Set up billing alerts
5. Consider scheduled shutdowns for non-critical workloads
Monthly Cost Estimates
24/7 running: ~$720/month
8 hours/day: ~$240/month
4 hours/day: ~$120/month
Next Steps
Set up monitoring alerts
Configure auto-scaling if needed
Implement proper error handling
Test with different prompts
We are open-sourcong Magemaker!! Stay Tuned!!!
As Always, Happy Coding!!!
if you have any questions, please do not hesitate to ask faizan|jneid@slashml.com.
Deploy any model In Your Private Cloud or SlashML Cloud
READ OTHER POSTS