Automation 360

Vertex AI: Multimodal Prompt AI action

Download as PDF

Vertex AI: Multimodal Prompt AI action

Download as PDF

Updated: 2025/12/08

The Vertex AI: Multimodal Prompt AI action uses Google's multimodal model that is capable of processing information from multiple modalities, including images, videos, and text. This capability allows it to handle complex tasks, such as describing the content of an image and a video provided as inputs.

Prerequisites

You must have the Bot creator role to use the Vertex AI: Multimodal Prompt AI action in an automation.
Ensure that you have the necessary credentials to send a request and have included Vertex AI: Connect action before calling any Google Cloud actions.

This example shows how to send this model a photo of a plate of cookies and ask it to generate a recipe for those cookies using the Vertex AI: Multimodal Prompt AI action and to get an appropriate response.

Procedure

In the Automation Anywhere Control Room, navigate to the Actions pane, select Generative AI > Google, drag Vertex AI: MultiModal Prompt AI, and place it in the canvas.
Enter or select the following fields:
1. Enter the Project Number/Name. This is the unique Project ID from the GCP. For more information on Project ID, see Google Cloud Project's Project ID.
2. Enter the Location. For more information on Vertex AI location, see Vertex AI locations.
3. Click Publisher drop-down and select Google; or select 3rd Party to enter a third-party publisher.
4. Select a large language model (LLM) to use for your prompt from the Model dropdown. You can select the following models:
  - Gemini Pro Vision (Deprecated)
  - Gemini 2.0 Flash-Lite
  - Gemini 2.0 Flash
  - Gemini 2.5 Flash-Lite
  - Gemini 2.5 Flash
  - Gemini 2.5 Pro
5. Enter a Prompt for the model to generate a response.
6. Upload up to ten images or a video. In this example, an image of a plate with several cookies is uploaded. To upload the images, do one of the following:
  - Select Image link and enter an image url or
  - Select Upload image to upload an image via file stream (refer File streaming using file variable) or Control Room file or Desktop file.
7. Select Yes to upload multiple images. You can upload up to ten images.
8. Enter the maximum number of tokens (Max tokens) to generate. By default, if you do not enter a value, then the maximum number of tokens generated is automatically set to keep it within the maximum context length of the selected model by considering the length of the generated response.
9. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, the response becomes more focused and deterministic. The higher the value, the more random is the response.
10. Enter Vertex-Default as the session name to limit the session to the current session. The entered name should match with the session established while connecting to GCP.
11. To manage the optional parameters, select Yes under Show more options to add other parameters such as:Top K and Top P. For information about these optional parameters, see Learn Models.
12. Save the response to a variable. In this example, the response is saved to VertexMultiModelResponse.
Click Run to start the bot. You can read the value of the field by printing the response in a Message box action. In this example, VertexMultiModelResponse prints the response.

See how Vertex AI's Multimodal Prompt AI action unlocks new possibilities! Watch this video showcasing a real-world use case.

When the following image is provided as input alongside the prompt, the generated response is shown in the table below:


Prompt	Response
Generate a recipe.	Ingredients: 1 cup all-purpose flour 1/2 teaspoon baking powder 1/4 teaspoon salt 1/2 cup (1 stick) unsalted butter, softened 1/2 cup granulated sugar 1 large egg 1 teaspoon vanilla extract 1 cup semisweet chocolate chips Instructions: Preheat oven to 375 degrees F (190 degrees C). Line a baking sheet with parchment paper. In a medium bowl, whisk together the flour, baking powder, and salt. In a large bowl, cream together the butter and sugar until light and fluffy. Beat in the egg and vanilla extract. Gradually add the dry ingredients to the wet ingredients, mixing until just combined. Fold in the chocolate chips. Drop the dough by rounded tablespoons onto the prepared baking sheet, spacing them about 2 inches apart. Bake for 10-12 minutes, or until the edges are golden brown and the centers are set. Let cool on the baking sheet for a few minutes before transferring to a wire rack to cool completely.