OpenAI: MultiModal Chat AI アクション

The OpenAI: MultiModal Chat AI アクション allows you to integrate OpenAI gpt-4o and OpenAI's vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

前提条件

  • You must have the Bot creator role to use the OpenAI MultiModal Chat AI アクション in a bot.
  • Ensure that you have the necessary credentials to send a request and have included OpenAI: [認証] アクション before calling any OpenAI actions.

This example shows how to send multiple images using the OpenAI MultiModal Chat AI アクションs and ask questions about what is present in the images.

手順

  1. In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > OpenAI, drag OpenAI: MultiModal Chat AI, and place it in the canvas.
  2. Enter or select the following fields:

    OpenAI MultiModal Chat AI

    1. Select a large language model (LLM) to use for your multimodal chat from the Model dropdown. You can select the following models:
      • gpt-4o (default)
      • gpt-4-turbo
      • gpt-4-turbo-2024-04-09
      • gpt-4-vision-preview
      • gpt-4-1106-vision-preview
      • Other supported version to input a supported model. In addition to the models listed above, you can explore a variety of other supported text-based preview models from OpenAI のその他のサポートされているバージョン.
    2. Enter a chat Message to use by the model to generate a response.
      注: チャット アクションは、同じセッション内で前のチャット アクションの結果を保持します。チャット アクションを連続して呼び出すと、モデルは後続のメッセージを理解し、前のメッセージに関連付けることができます。ただし、セッションが終了すると、チャット履歴はすべて削除されます。
    3. Select an image: You can either choose Image link and enter an image url or select Upload image to upload an image. In this example: An image of an violet flower is attached to the first instance of the OpenAI MultiModal Chat AI アクション and a dog image is attached to the second instance of the same action.
    4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
    5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
    6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
    7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see OpenAI 作成チャット and OpenAI Vision.
      注:
      • Maximum chat message count: This field allows you to limit the number of messages stored in the chat history for the Multimodal Chat AI action. This is particularly useful when working with multiple images, as each message containing an image can significantly increase the payload size. By setting a limit (between 0-10), you can optimize the chat session size and ensure subsequent requests run smoothly. A value of 0 will function identically to a Prompt action, where no chat history is maintained. In the above example, the value is set to 3. This means the chat history will retain the current prompt, the response from the previous interaction, and the request from the previous interaction.
      • Image fidelity: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see OpenAI Vision.
    8. Save the response to a variable. In this example, the response is saved to OpenAI-Response.
  3. Click Run to start the Bot. You can read the value of the field by printing the response in a Message box アクション. In this example, OpenAI-Response prints the response.
    ヒント: 同じ Bot で複数のチャットを維持するには、異なる名前や変数で複数のセッションを作成する必要があります。
The response of the above automation is as follows:

OpenAI MultiModal Chat AI Response