Automation 360

OpenAI: MultiModal Chat AI action

Download as PDF

OpenAI: MultiModal Chat AI action

Download as PDF

Updated: 2024/07/10

The OpenAI: MultiModal Chat AI action allows you to integrate OpenAI gpt-4o and OpenAI's vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

Prerequisites

You must have the Bot creator role to use the OpenAI MultiModal Chat AI action in a bot.
Ensure that you have the necessary credentials to send a request and have included OpenAI: Authenticate action before calling any OpenAI actions.

This example shows how to send multiple images using the OpenAI MultiModal Chat AI actions and ask questions about what is present in the images.

Procedure

In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > OpenAI, drag OpenAI: MultiModal Chat AI, and place it in the canvas.
Enter or select the following fields:
1. Select a large language model (LLM) to use for your multimodal chat from the Model dropdown. You can select the following models:
  - gpt-4o (default)
  - gpt-4-turbo
  - gpt-4-turbo-2024-04-09
  - gpt-4-vision-preview
  - gpt-4-1106-vision-preview
  - Other supported version to input a supported model. In addition to the models listed above, you can explore a variety of other supported text-based preview models from OpenAI other supported versions.
2. Enter a chat Message to use by the model to generate a response.
  
  Note: Chat actions retain the result of the previous chat action within the same session. If you call chat actions consecutively, the model can understand subsequent messages and relate them to the previous message. However, all chat history is deleted after the session ends.
3. Select an image: You can either choose Image link and enter an image url or select Upload image to upload an image. In this example: An image of an violet flower is attached to the first instance of the OpenAI MultiModal Chat AI action and a dog image is attached to the second instance of the same action.
4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see OpenAI create chat and OpenAI Vision.
  Note:
  - Maximum chat message count: This field allows you to limit the number of messages stored in the chat history for the Multimodal Chat AI action. This is particularly useful when working with multiple images, as each message containing an image can significantly increase the payload size. By setting a limit (between 0-10), you can optimize the chat session size and ensure subsequent requests run smoothly. A value of 0 will function identically to a Prompt action, where no chat history is maintained. In the above example, the value is set to 3. This means the chat history will retain the current prompt, the response from the previous interaction, and the request from the previous interaction.
  - Image fidelity: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see OpenAI Vision.
8. Save the response to a variable. In this example, the response is saved to OpenAI-Response.
Click Run to start the bot. You can read the value of the field by printing the response in a Message box action. In this example, OpenAI-Response prints the response.

Tip: To maintain multiple chats in the same bot, you will need to create multiple sessions with different names or variables.

The response of the above automation is as follows:

OpenAI MultiModal Chat AI Response