ChatGPT Azure Completions

In this article we’ll explore the necessary operations to activate ChatGPT completions via Azure endpoints. While our primary emphasis is on chat completions, we will also briefly discuss some other accessible operations through the API. This instance serves to illustrate simple operations swiftly and should not be considered as a comprehensive tutorial.

import openai

Setup

To ensure the subsequent sections function correctly, we need to initiate some preliminary setup. Let’s begin with defining the api_base and api_version. You can locate your api_base by navigating to Microsoft Azure Portal, locating your resource, and then under “Resource Management” -> “Keys and Endpoints”, find the “Endpoint” value.

openai.api_version = '2023-05-15'
openai.api_base = '' # Please add your endpoint here

Following that, we must establish the api_type and api_key. These can be obtained from the portal or via Microsoft Active Directory Authentication. The api_type will be defined as either ‘azure’ or ‘azure_ad’, depending on which method you choose for retrieving the key.

Setup: Azure Portal

Firstly, let’s discuss how to retrieve the key from the portal. Navigate to https://portal.azure.com, locate your resource, and then under the “Resource Management” -> “Keys and Endpoints” section, search for one of the “Keys” values.

openai.api_type = 'azure'
openai.api_key = ''  # Please add your api key here

Deployments

In this segment, we will create a deployment using the gpt-35-turbo model. This deployment will then be used to generate chat completions.

Deployments: Create manually

Let’s proceed to create a deployment using the gpt-35-turbo model. Please navigate to https://portal.azure.com, locate your resource, and then under “Resource Management” -> “Model deployments”, initiate a new gpt-35-turbo deployment.

deployment_id = '' # Fill in the deployment id from the portal here

Create chat completion

Now let’s send a sample chat completion to the deployment.

# For all possible arguments see https://platform.openai.com/docs/api-reference/chat-completions/create
response = openai.ChatCompletion.create(
    deployment_id=deployment_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
    temperature=0,
)

print(f"{response.choices[0].message.role}: {response.choices[0].message.content}")

We can also stream the response.

response = openai.ChatCompletion.create(
    deployment_id=deployment_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
    temperature=0,
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta

    if "role" in delta.keys():
        print(delta.role + ": ", end="", flush=True)
    if "content" in delta.keys():
        print(delta.content, end="", flush=True)

You may be also interested in Best practices for GPT fine-tuning.