Using the /transcribe API with AWS S3

Using the /transcribe API with AWS S3

Overview

The Amazon Web Service (AWS) Simple Storage Service (S3) is a common location for archiving audio files and metadata together in zip files. If you already have such files stored in S3, you can use the /transcribe API's support for S3 to process them from that location, which can save upload time because Conversations typically must upload your files to S3 for processing.
In order to use the /transcribe API with S3 from the command line, you must pass your AWS_ACCESS_KEY (referred to in the /transcribe API as your aws_id) and AWS_SECRET_KEY (referred to as your aws_secret) by using the curl command support for filling in forms.

The following is the general format of a cURL command that calls the /transcribe API to transcribe a file or directory that is stored in S3:
  1. curl  -F token=AUTH_TOKEN \
            -F aws_id=AWS_ACCESS_KEY \
            -F aws_secret=AWS_SECRET_KEY \
            -F s3key=s3://BUCKET/path/to/file/or/directory \
            -F region=S3_REGION \
            -X POST https://URL_BASE_PATH/transcribe/ORG_SHORT/FOLDER

Fields to Provide

The user-specific fields that you need to provide are the following:

AUTH_TOKEN
The authorization token that you are using to retrieve information. 
A company's authorisation token is found on the Conversations Accounts page in the Company section:

AWS_ACCESS_KEY
The Amazon key for the bucket in which the file that you want to transcribe is stored
AWS_SECRET_KEY
The secret Amazon key for the bucket in which the file that you want to transcribe is stored
BUCKET
The Amazon S3 bucket in which the file that you want to transcribe is stored
path/to/file/or/directory
The path to the file that you want to process, a zip file that contains the audio file that you want to process (and an optional metadata file), or to a directory that contains a hierarchy of files that you want to process. If you specify a directory, all of the files that are located under that directory will be queued for transcription. Files that are submitted for processing but which are not in a format that is supported by Conversations will not be processed and will be listed in the Conversations folder's process log as being UNSUPPORTED.
S3_REGION
You must specify the Amazon S3 region of the S3 bucket. The region option on the request specifies which regional endpoint to use for the request.
This option reduces request latency and is required.
URL_BASE_PATH
The base URL that is correct according to the environment being used. 
ORG_SHORT
The short name of the organization that you are using.
An Organization's short name is found on the Conversations Accounts page in the Organizations section:

FOLDER
The Conversations folder in which you want the transcript and audio output that is produced by Conversations to be stored.
The following is a specific example of calling the /transcribe API to transcribe a zip file that is stored in S3:

  1. curl  -F token=0123456789abcde0123456789abcde01 \
            -F aws_id=012345678901234567890 \
            -F aws_secret=01234567890123456789012345678901234567890 \
            -F s3key=s3://example.company.com/documentation-TEST.zip \
            -F region=us-east-1 \
            -X POST
    https://uk-conversations.awaken.io/transcribe/Test-Testing/Test01
This example transcribes the audio in the zip file named documentation-TEST.zip in the bucket example.company.com and puts the results of that transcription in the Test01 folder of the organization Test-Testing. As with other calls to the /transcribe API, it returns the request ID for your transcription request, which you can subsequently use with the /request API.

By default, any zip file in S3 that you have identified for transcription using the /transcribe API remains stored on S3 after its contents have been transcribed. Keeping such files in S3 after their content has been transcribed may not be necessary, so the /transcribe API includes a "delete=true" option that you can pass to delete a file after its content has been transcribed. In an application, you would pass this as an additional parameter to the /transcribe API call. In a curl command, you would add the -F delete=true option to your command line.

    • Related Articles

    • Extract summary data from Conversations in csv format

      Introduction The primary method used for processing data from Awaken Conversations into customer own data warehouses is to use a built-in capability in the platform called Callbacks. More information on this can be found in the Awaken public ...
    • Search for Credit Card numbers in transcript

      There might be times when your PCI DSS process failed, and a credit card number ends up in a recording. In order to identify calls with potential credit card numbers in Awaken Conversations, you can bookmark regex expressions to search for credit ...
    • S3 Bucket Replication

      The Awaken PS team will provide the following information S3 Bucket Name AWS Account ID Attached file includes the document to follow to complete the replciation setup.
    • Creating IAM Role

      Create an IAM role with the following trust relationship: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "s3.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } Create a policy as follows, replacing ...