Anthropic: MultiModal AI action

The Anthropic: Multimodal AI action connects your automation to Amazon Bedrock Anthropic's Claude 3 models that can handle complex tasks, such as describing the content of images provided as inputs.

Prerequisites

  • You must have the Bot creator role to use the Anthropic: Multimodal AI action in a bot.
  • Ensure that you have the necessary credentials to send a request. For more information on acquiring the credentials, see Amazon Bedrock: Authenticate action.

This example showcases how to send Claude 3 model a specific image and ask targeted questions, generating relevant answers based on the content.

Procedure

  1. In the Control Room, navigate to the Actions pane, select Generative AI > Amazon Bedrock, drag Anthropic: MultiModal AI and place it in the canvas.
  2. Enter or select the following fields:

    Anthropic-multimodal

    1. Enter the Region. For information on Region, see Amazon Bedrock GA regions.
    2. Select a large language model (LLM) to use for your prompt from the Model dropdown. You can select the following models:
      • Claude 3 Sonnet v1
      • Claude 3 Haiku v1
      • Other supported version to input other supported models.
      In this example, the Claude 3 Sonnet v1 is selected.
    3. Enter a Prompt for the model to generate a response.
    4. Upload up to five images. This example showcases an image with 3 dogs in a grassy area.
    5. Select Yes to upload multiple images. You can upload up to five images.
    6. Enter the Maximum length.
      By default, if you do not enter a value, then the maximum length is automatically set to keep it within the maximum context length of the selected model by considering the length of the generated response.
    7. Enter a Temperature. This value refers to the randomness of the response. As the temperature approaches zero, the response becomes specific. The higher the value, the more random is the response.
    8. Enter the name for the session to limit the session to the current session. Use the same name used in the Authentication action.
    9. To manage the optional parameters, click Show more options and select Yes. If you select Yes, you can add other parameters such as: System Prompt, Top P, Top K, Add instructions, Stop sequences, or enter an Anthropic version. For information about these optional parameters, see Learn Models.
      Note: Claude 3 models accepts System Prompts. Unlike traditional instructions, system prompts provide a structured way to guide Claude 3. This is because Claude 3 is trained to understand the intent behind your prompt and generate responses that fulfill that goal, rather than simply following a set of commands.
    10. Save the response to a variable.
      In this example, the response is saved to multiModalResponse
  3. Click Run to start the bot.
    You can read the value of the field by printing the response in a Message box action. In this example, multiModalResponse prints the response.

When the following image is provided as input alongside the prompt, the generated response is shown in the table below:

Prompt Response
Explain the Image

Prompt Image - Dogs on Grass

The image shows three adorable puppy dogs running together outdoors on a grassy field. They appear to be of a herding breed, possibly Australian Shepherds or a similar type. Their coats are a mix of black, white, and tan colors. The puppies have floppy ears and are full of energy, captured in a playful moment as they race across the open space. The background has a slightly blurred setting, allowing the focus to be on the lively and joyful puppies in the foreground. The image conveys a sense of happiness, youthfulness, and the pure fun associated with playful young dogs.