Automation 360

OpenAI: MultiModal Chat AI acción

Descargar como PDF

Contenidos

OpenAI: MultiModal Chat AI acción

Descargar como PDF

Última actualización2024/11/11

OpenAI: MultiModal Chat AI acción

The OpenAI: MultiModal Chat AI acción allows you to integrate OpenAI gpt-4o and OpenAI's vision capabilities into your workflows. This means your automations can now process and answer questions about images, going beyond just text-based interactions.

Antes de empezar

You must have the Bot creator role to use the OpenAI MultiModal Chat AI acción in a bot.
Ensure that you have the necessary credentials to send a request and have included OpenAI: Acción Autenticar before calling any OpenAI actions.

This example shows how to send multiple images using the OpenAI MultiModal Chat AI accións and ask questions about what is present in the images.

Procedimiento

In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > OpenAI, drag OpenAI: MultiModal Chat AI, and place it in the canvas.
Enter or select the following fields:
1. Select a large language model (LLM) to use for your multimodal chat from the Model dropdown. You can select the following models:
  - gpt-4o (default)
  - gpt-4-turbo
  - gpt-4-turbo-2024-04-09
  - gpt-4-vision-preview
  - gpt-4-1106-vision-preview
  - Other supported version to input a supported model. In addition to the models listed above, you can explore a variety of other supported text-based preview models from OpenAI other supported versions.
2. Enter a chat Message to use by the model to generate a response.
  
  Nota: Las acciones de chat conservan el resultado de la acción de chat anterior dentro de la misma sesión. Si activa las acciones de chat consecutivamente, el modelo puede comprender los mensajes posteriores y relacionarlos con el mensaje anterior. Sin embargo, todo el historial de chat se elimina una vez finalizada la sesión.
3. Select an image: You can either choose Image link and enter an image url or select Upload image to upload an image. In this example: An image of an violet flower is attached to the first instance of the OpenAI MultiModal Chat AI acción and a dog image is attached to the second instance of the same action.
4. Enter the maximum number of tokens to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of generated response.
5. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, it makes the response more focused and deterministic. The higher the value, the more random is the response.
6. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action. You can use a variable instead.
7. To manage the optional parameters, select Yes under Show more options to add other parameters such as: Maximum chat message count, Top P, Stop, Presence Penalty, Frequency Penalty, User, Logit bias, Response format, and Image fidelity. For information about these optional parameters, see OpenAI create chat and OpenAI Vision.
  Nota:
  - Maximum chat message count: This field allows you to limit the number of messages stored in the chat history for the Multimodal Chat AI action. This is particularly useful when working with multiple images, as each message containing an image can significantly increase the payload size. By setting a limit (between 0-10), you can optimize the chat session size and ensure subsequent requests run smoothly. A value of 0 will function identically to a Prompt action, where no chat history is maintained. In the above example, the value is set to 3. This means the chat history will retain the current prompt, the response from the previous interaction, and the request from the previous interaction.
  - Image fidelity: This field allows you to control over how the model processes the image and generates its textual understanding. For more information, see OpenAI Vision.
8. Save the response to a variable. In this example, the response is saved to OpenAI-Response.
Click Run to start the bot. You can read the value of the field by printing the response in a Message box acción. In this example, OpenAI-Response prints the response.

Consejo: Para mantener múltiples chats en el mismo bot, deberá crear múltiples sesiones con diferentes nombres o variables.

The response of the above automation is as follows:

OpenAI MultiModal Chat AI Response

Ningún tema anterior

No hay tema siguiente

Ningún tema anterior

No hay tema siguiente