Azure OpenAI: MultiModal Chat AI action

The Azure OpenAI: MultiModal Chat AI action allows you to integrate Azure OpenAI gpt-4o and gpt-4 vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

Prerequisites

  • You must have the Bot creator role to use the Azure OpenAI: MultiModal Chat AI action in an automation.
  • Ensure that you have the necessary credentials to send a request and have included Azure OpenAI: Authenticate action before calling any Microsoft Azure OpenAI actions.

This example shows how to send a natural language message using the Azure OpenAI: MultiModal Chat AI action and get an appropriate response.

Procedure

  1. In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > Microsoft Azure OpenAI, drag Azure OpenAI: Multimodal Chat AI, and place it in the canvas.
  2. Enter or select the following fields:

    Azure OpenAI MultiModal Chat AI

    1. Enter the Deployment ID from the Azure OpenAI. The Deployment ID is associated to the large language model (LLM) you want to use for your prompt and can be copied from the Automation Anywhere Control Room.
    2. Enter a chat Message to use by the model to generate a response.
      Note: Chat actions retain the result of the previous chat action within the same session. If you call chat actions consecutively, the model can understand subsequent messages and relate them to the previous message. However, all chat history is deleted after the session ends.
    3. Select an image: You can either choose Image link and enter an image URL or select Upload image to upload an image.
      Example:

      In the provided example, an image of a violet flower is associated with the first instance of the Azure OpenAI: MultiModal Chat AI action. The second instance uses a cheetah image, while the third instance features three dogs sitting in a field surrounded by white flowers (as shown in the picture below).

      azure openai multimodal chatai dogs sample

    4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
    5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
    6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
    7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see Azure Open AI chat completions.
      Note:
      • Maximum chat message count:

        This setting controls how many messages are kept in the chat history for the Multimodal Chat AI action. This is especially important when working with multiple images, as each image can significantly increase the message size. By setting a limit (between 0 and 10), you can optimize the chat session size and prevent performance issues.

        • 0: No chat history is maintained, similar to a Prompt action.
        • 1-10: The specified number of messages (including the current prompt and the most recent responses) are retained.

        In the example above, the value is set to 4. This means the chat history will include the current prompt and the responses from the previous 3 interactions.

      • Detail parameter: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see Azure OpenAI Service REST API reference.
    8. Save the response to a variable. In this example, the response is saved to AzureOpenAI-MultiModalChat-Response.
  3. Click Run to start the automation. You can read the value of the field by printing the response in a Message box action. In this example, str_chatai-response prints the response.
    Tip: To maintain multiple chats in the same bot, you will need to create multiple sessions with different names or variables.
The response of the above automation is as follows:

Azure OpenAI MultiModal Chat AI Response