The fastest method to run an evaluation is by performing it automatically. This leverages LLM as a judge to compare data, rather than human effort.

Procedure

  1. Evaluations can be started from two entry points.
    • From the AI Evaluations page select Run Evaluation.
    • From the AI Skill page, select Evaluate > Run Evaluation. This will automatically populate the next step.
    You are navigated to a new page to configure the evaluation.
  2. Select the skill to evaluate.
  3. Click Next.
  4. Select the method Evaluate automatically to use LLM as a judge and NLP metrics in the evaluation.
  5. Add your data set
    • Select Upload file to enter a Name and choose the file to use in the evaluation. The file should be in CSV format with a maximum size of 100 kb.
    • Select Use existing data to pick a data set that has previously been uploaded.
    • Select Enter data manually to provide a name and create a data set manually by providing input variables and optional expected outputs.
  6. Click Run evaluation.
    The evaluation saves your data and begins running. Processing can take some time, depending on the size of data in the evaluation. Upon completion, a notification is sent to you including a link to the evaluation.
  7. Navigate to the results through the Evaluation tab or by clicking the link in the notification.