This package supports the following audio file formats: flac, mpeg,
mp3, ogg, pcm, wav, and webm. The following languages are supported: Arabic, Brazilian
Portuguese, Chinese (Mandarin), English (United Kingdom and United States), French, German,
Japanese, Korean, Spanish (Argentinian, Castilian, Chilean, Colombian, Mexican, and
Peruvian).
Feature |
Description |
Detect speakers |
Identifies the individuals in a conversation between multiple
people.
- Supports English, Japanese, and Spanish.
- Use for conversation between two people; maximum six
people.
- For best results, use an audio file at least a minute
long.
The output contains the words spoken by each speaker and
the timestamp. |
Keyword spotting |
Detects specific strings in the transcript. The output
contains the timestamp(s) for each keyword and a confidence
score. |
Smart formatting |
Converts the following types of strings into more
conventional representations to make the transcript easier to
read:
- Dates
- Times
- Series of digits and numbers
- Phone numbers
- Currency values
- Email and web addresses
For examples, see Smart formatting
results. This feature supports English, Japanese, and
Spanish. |
Profanity filter |
Obscures profanity by replacing it with asterisks in the
transcript. |