Vertex AI: Multimodal Prompt AI action
- Updated: 2024/05/30
Vertex AI: Multimodal Prompt AI action
The Vertex AI: Multimodal Prompt AI action uses Google's multimodal model that is capable of processing information from multiple modalities, including images, videos, and text. This capability allows it to handle complex tasks, such as describing the content of an image and a video provided as inputs.
Prerequisites
- You must have the Bot creator role to use the Vertex AI: Multimodal Prompt AI action in an automation.
- Ensure that you have the necessary credentials to send a request and have included Vertex AI: Connect action before calling any Google Cloud actions.
This example shows how to send this model a photo of a plate of cookies and ask it to generate a recipe for those cookies using the Vertex AI: Multimodal Prompt AI action and to get an appropriate response.
Procedure
See how Vertex AI's Multimodal Prompt AI action unlocks new possibilities! Watch this video showcasing a real-world use case.
When the following image is provided as input alongside the prompt, the generated response is shown in the table below:
Prompt | Response |
---|---|
Generate a recipe. |
Ingredients:
Instructions:
|