Faizan Khan

@faizan10114

Published on Jan 3, 2025

Deploying Llama3.1 on Amazon Sagemaker

Faizan Khan

@faizan10114

Published on Jan 3, 2025

In this guide, we'll walk through the process of deploying Meta's Llama 3 model on Amazon SageMaker. We'll cover everything from setting up your AWS environment to deploying and testing the model.

Prerequisites

Before we begin, make sure you have:

  • An AWS account

  • Python 3.11 or later installed

  • Basic familiarity with Python and AWS concepts

Step 1: Setting Up Your AWS Environment

1.1 Create an IAM User

First, you'll need an IAM user with appropriate permissions:

  1. Go to AWS IAM Console (https://console.aws.amazon.com/iam/)

  2. Click "Users" → "Add user"

  3. Set a username and enable "Access key - Programmatic access"

  4. Attach the following policies:

    • AmazonSageMakerFullAccess

    • AmazonS3FullAccess

Remember to save your access key ID and secret access key securely.

1.2 Install Required Python Packages

Create a new Python virtual environment and install the necessary packages:


1.3 Configure AWS Credentials

Set up your AWS credentials using one of these methods:


Step 2: Preparing the Deployment Code

Create a new Python script named deploy_llama.py:

pythonCopyimport boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
import json

def create_llama_model(
    role_arn,
    model_id="meta-llama/Meta-Llama-3-8B-Instruct",
    instance_type="ml.g5.2xlarge",
):
    # Initialize SageMaker session
    session = sagemaker.Session()
    
    # Define model configuration
    huggingface_model = HuggingFaceModel(
        model_id=model_id,
        role=role_arn,
        transformers_version="4.28",
        pytorch_version="2.0",
        py_version="py310",
    )
    
    # Deploy the model
    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type=instance_type,
        endpoint_name="llama3-1-endpoint"
    )
    
    return predictor

def get_sagemaker_role():
    """Get or create SageMaker execution role"""
    iam = boto3.client('iam')
    
    # Try to get existing SageMaker role
    try:
        role = iam.get_role(RoleName='SageMakerExecutionRole')
        return role['Role']['Arn']
    except iam.exceptions.NoSuchEntityException:
        # Create new role if it doesn't exist
        role_policy = {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "sagemaker.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
        
        iam.create_role(
            RoleName='SageMakerExecutionRole',
            AssumeRolePolicyDocument=json.dumps(role_policy)
        )
        
        # Attach necessary policies
        policies = [
            'arn:aws:iam::aws:policy/AmazonSageMakerFullAccess',
            'arn:aws:iam::aws:policy/AmazonS3FullAccess'
        ]
        
        for policy in policies:
            iam.attach_role_policy(
                RoleName='SageMakerExecutionRole',
                PolicyArn=policy
            )
        
        role = iam.get_role(RoleName='SageMakerExecutionRole')
        return role['Role']['Arn']

if __name__ == "__main__

Step 3: Deploying the Model

  1. Make sure you have access to Llama 3.1. You'll need to:

  2. Run the deployment script:

Step 4: Querying the Model

Create a script named query_llama.py:

import boto3
import json

def query_endpoint(text_input, endpoint_name="llama2-endpoint"):
    # Create a SageMaker runtime client
    client = boto3.client('sagemaker-runtime')
    
    # Prepare the input
    payload = {
        "inputs": text_input,
        "parameters": {
            "max_length": 100,
            "temperature": 0.7,
            "top_p": 0.9,
        }
    }
    
    # Query the endpoint
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    
    # Parse and return the response
    result = json.loads(response['Body'].read().decode())
    return result

if __name__ == "__main__

Important Considerations

  1. Costs: Keep in mind that running a SageMaker endpoint incurs costs. The ml.g5.2xlarge instance type costs approximately $1.52 per hour (US East region).

  2. Model Size: This guide uses the 7B parameter version of Llama 2. For larger versions (13B, 70B), you'll need more powerful instance types.

  3. Instance Types:

    • 7B model: ml.g5.2xlarge

    • 13B model: ml.g5.4xlarge

    • 70B model: ml.g5.12xlarge or larger

  4. Cleanup: To avoid unnecessary charges, delete endpoints when not in use:


Troubleshooting

Common issues and solutions:

  1. Model Access Error: Make sure you've been granted access to Llama 3.1 and accepted the model on Hugging Face.

  2. Instance Limit Error: You might need to request a service quota increase for your chosen instance type.

  3. Memory Issues: If you see out-of-memory errors, try a larger instance type or reduce the batch size in your requests.

Conclusion

You now have a working Llama 3 deployment on Amazon SageMaker! Remember to monitor your costs and delete unused endpoints. For production deployments, consider adding error handling, monitoring, and auto-scaling configurations.

For more information, refer to:

Try out our dashboard

Try out our dashboard

Deploy any model In Your Private Cloud or SlashML Cloud

READ OTHER POSTS

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal

©2024 – Made with ❤️ & ☕️ in Montreal