Creating a Llama2 Managed Endpoint in Azure ML and Using it from Langchain

Part 1: A step-by-step guide to creating a Llama2 model in Azure ML.

7 min readAug 3, 2023

Today, we are going to show step by step how to create a Llama2 model (from Meta), or any other model you select from Azure ML Studio, and most importantly, using it from Langchain.

Once you are logged in Azure ML, click on Model Catalog on the left.

Then on Introducing Llama 2, click on view models.

Select the model you want, in my case I selected the smallest one, Llama2 7B.

On the summary page, click on Deploy and Select Real-time endpoint

Then select to deploy with AI Content Safety which can detect and filter harmful content:

All these steps were just basically to show you a Jupyter Notebook where we will replace some variables to deploy the Azure ML Managed Endpoint.

Click on Clone this notebook, and lets start reviewing the code:

Let's start with some basic variables, with this piece of code we define the registry where we will get the model from, the name of the model, the name of the endpoint, the deployment name, the compute instance (size), and the severity threshold of content to be filtered with Azure AI Content Safety.

💡 Learn how to use LLMs to generate clips from long videos:

Build your own AI Video editor with Node.js, AssemblyAI & StreamPot

Use LLMs to generate clips from long videos

differ.blog

👉 To read more such acrticles, sign up for free on Differ.

Please note the standard sku selected its a GPU VM, here you can find a list of the supported sizes:

Python in Plain English

Creating a Llama2 Managed Endpoint in Azure ML and Using it from Langchain

Part 1: A step-by-step guide to creating a Llama2 model in Azure ML.

Build your own AI Video editor with Node.js, AssemblyAI & StreamPot

Use LLMs to generate clips from long videos

Published in Python in Plain English

Written by Luis Valencia

Responses (4)