OpenAI: MultiModal Chat AI 작업

The OpenAI: MultiModal Chat AI 작업 allows you to integrate OpenAI gpt-4o and OpenAI's vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

전제 조건

  • You must have the Bot creator role to use the OpenAI MultiModal Chat AI 작업 in a bot.
  • Ensure that you have the necessary credentials to send a request and have included OpenAI: 인증 작업 before calling any OpenAI actions.

This example shows how to send multiple images using the OpenAI MultiModal Chat AI 작업s and ask questions about what is present in the images.

프로시저

  1. In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > OpenAI, drag OpenAI: MultiModal Chat AI, and place it in the canvas.
  2. Enter or select the following fields:

    OpenAI MultiModal Chat AI

    1. Select a large language model (LLM) to use for your multimodal chat from the Model dropdown. You can select the following models:
      • gpt-4o (default)
      • gpt-4-turbo
      • gpt-4-turbo-2024-04-09
      • gpt-4-vision-preview
      • gpt-4-1106-vision-preview
      • Other supported version to input a supported model. In addition to the models listed above, you can explore a variety of other supported text-based preview models from OpenAI other supported versions.
    2. Enter a chat Message to use by the model to generate a response.
      주: 채팅 작업은 동일한 세션 내에서 이전 채팅 작업의 결과를 유지합니다. 채팅 작업을 연속적으로 호출하면 모델이 후속 메시지를 이해하고 이전 메시지와 연관시킬 수 있습니다. 그러나 세션이 종료되면 모든 채팅 기록이 삭제됩니다.
    3. Select an image: You can either choose Image link and enter an image url or select Upload image to upload an image. In this example: An image of an violet flower is attached to the first instance of the OpenAI MultiModal Chat AI 작업 and a dog image is attached to the second instance of the same action.
    4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
    5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
    6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
    7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see OpenAI create chat and OpenAI Vision.
      주:
      • Maximum chat message count: This field allows you to limit the number of messages stored in the chat history for the Multimodal Chat AI action. This is particularly useful when working with multiple images, as each message containing an image can significantly increase the payload size. By setting a limit (between 0-10), you can optimize the chat session size and ensure subsequent requests run smoothly. A value of 0 will function identically to a Prompt action, where no chat history is maintained. In the above example, the value is set to 3. This means the chat history will retain the current prompt, the response from the previous interaction, and the request from the previous interaction.
      • Image fidelity: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see OpenAI Vision.
    8. Save the response to a variable. In this example, the response is saved to OpenAI-Response.
  3. Click Run to start the . You can read the value of the field by printing the response in a Message box 작업. In this example, OpenAI-Response prints the response.
    팁: 동일한 봇에서 여러 개의 채팅을 유지하려면, 서로 다른 이름이나 변수를 사용하여 여러 세션을 만들어야 합니다.
The response of the above automation is as follows:

OpenAI MultiModal Chat AI Response