OpenAI: MultiModal Chat AI Aktion

The OpenAI: MultiModal Chat AI Aktion allows you to integrate OpenAI gpt-4o and OpenAI's vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

Vorbereitungen

  • You must have the Bot creator role to use the OpenAI MultiModal Chat AI Aktion in a bot.
  • Ensure that you have the necessary credentials to send a request and have included OpenAI: Aktion „Authentifizieren“ before calling any OpenAI actions.

This example shows how to send multiple images using the OpenAI MultiModal Chat AI Aktions and ask questions about what is present in the images.

Prozedur

  1. In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > OpenAI, drag OpenAI: MultiModal Chat AI, and place it in the canvas.
  2. Enter or select the following fields:

    OpenAI MultiModal Chat AI

    1. Select a large language model (LLM) to use for your multimodal chat from the Model dropdown. You can select the following models:
      • gpt-4o (default)
      • gpt-4-turbo
      • gpt-4-turbo-2024-04-09
      • gpt-4-vision-preview
      • gpt-4-1106-vision-preview
      • Other supported version to input a supported model. In addition to the models listed above, you can explore a variety of other supported text-based preview models from OpenAI other supported versions.
    2. Enter a chat Message to use by the model to generate a response.
      Anmerkung: Die Chat-Aktionen behalten das Ergebnis der vorherigen Chat-Aktion innerhalb derselben Sitzung bei. Wenn Sie Chat-Aktionen nacheinander aufrufen, kann das Modell die nachfolgenden Nachrichten verstehen und sie mit der vorherigen Nachricht in Beziehung setzen. Der gesamte Chatverlauf wird jedoch nach Beendigung der Sitzung gelöscht.
    3. Select an image: You can either choose Image link and enter an image url or select Upload image to upload an image. In this example: An image of an violet flower is attached to the first instance of the OpenAI MultiModal Chat AI Aktion and a dog image is attached to the second instance of the same action.
    4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
    5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
    6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
    7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see OpenAI create chat and OpenAI Vision.
      Anmerkung:
      • Maximum chat message count: This field allows you to limit the number of messages stored in the chat history for the Multimodal Chat AI action. This is particularly useful when working with multiple images, as each message containing an image can significantly increase the payload size. By setting a limit (between 0-10), you can optimize the chat session size and ensure subsequent requests run smoothly. A value of 0 will function identically to a Prompt action, where no chat history is maintained. In the above example, the value is set to 3. This means the chat history will retain the current prompt, the response from the previous interaction, and the request from the previous interaction.
      • Image fidelity: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see OpenAI Vision.
    8. Save the response to a variable. In this example, the response is saved to OpenAI-Response.
  3. Click Run to start the Bot. You can read the value of the field by printing the response in a Message box Aktion. In this example, OpenAI-Response prints the response.
    Tipp: Um mehrere Chats im selben Bot zu verwalten, müssen Sie mehrere Sitzungen mit unterschiedlichen Namen oder Variablen erstellen.
The response of the above automation is as follows:

OpenAI MultiModal Chat AI Response