VoiceBot Applet

10 min

configuring a bidirectional stream voicebot applet you can enable bidirectional streaming on a call flow using the voicebot applet the applet takes 6 parameters 1\ url this is the url to which exotel will stream the voice media you can either specify a wss endpoint or a https endpoint if you specify a http/https endpoint, exotel expects the https endpoint to return a wss url in its response this is to allow a dynamic endpoints for the same call flow b have dynamic custom parameters that can be passed to the websocket endpoint to handle any application specific customisation there are two ways to put this a static method ws(s) endpoint can be entered here, but it will remain the same for every call that you are going to make using this flow eg ws\ //127 0 0 1 5001/media b dynamic method we can enter http(s) url which can return different ws(s) endpoints based on the use case eg https //yourdomain com this url must return a ws(s) endpoint 2\ authentication options for wss endpoints when configuring your wss endpoint for the voicebot applet, exotel supports the following authentication methods ip whitelisting (basic setup) secure your wss connections by whitelisting exotel’s outbound ip ranges this avoids the need for credential exchange reach out to hello\@exotel com to get the range of ips basic authentication developers specify credentials in the wss url, but exotel transmits them securely in headers during the connection wss url configuration (developer input) wss\ //\<api key> \<api token>@stream yourdomain com/\<stream path> what exotel sends (header) authorization basic base64 (api key\ api token) this approach ensures compatibility and prevents credentials from being exposed in transit urls 3\ wss sample rate parameters your wss endpoint should support configurable sample rates exotel’s voicebot applet can define the required rate as a query parameter 8 khz (standard pstn quality – default) wss\ //your domain com/?sample rate=8000 16 khz (enhanced quality) wss\ //your domain com/?sample rate=16000 24 khz (hd quality) wss\ //your domain com/?sample rate=24000 default behaviour if no sample rate is explicitly defined, exotel will use 8 khz as the default best practice use 16 khz for most voicebot integrations (balanced quality vs bandwidth) use 8 khz only when interworking with legacy pstn and 24 khz for hd audio experiences 4\ custom parameters(optional) custom params along with the endpoint there are some validations that need to follow while providing custom params a maximum number of custom params that are allowed is 3 b format of these params will like ws\ //127 0 0 1 5001/media?param1=value1\&param2=value2\&param3=value3 (in dynamic case, http(s) should return ws(s) url in above format) c total length of the params( bold part in above url ) shouldn’t be more than 256 characters 5\ record the checkbox gives an option to record the conversation and generate a recording url available in passthru applet after voicebot applet 6\ next applet in the case of a bi directional stream the stream can end if the call is disconnected or the websocket is closed or the stream is explicitly stopped by the client in the case of bi directional stream you do not need to add a explicit “stop” stream applet since the stream is automatically closed before executing the next applet the call flow proceeds to the next applet configured video walkthrough you can find a quick walkthrough of a sample flow here protocol communication between exotel and customer endpoint happens over websocket connection websocket messages from exotel each message in the websocket will be sent/received as a json string following are the types of messages that are sent \ connected start media dtmf stop mark (only in bidirectional) clear connected message after websocket connection is established, this message will be sent {"event" "connected", } start message start message will contain information about the stream parameters it will be sent only once, right after the connected message the custom parameters are picked from the url configured in the stream applet if you had mentioned the url as wss\ //yourstream service com?queuename=premium\&product=radio queuename and product would be passed in as keys with premium and radio as values { "event" "start", "sequence number" 1, "stream sid" "\<stream sid>", "start" { "stream sid" "<>", "call sid" "", "account sid" "", "from" "", "to" "", "custom parameters" { "key1" "value1", "key2" "value2" }, "media format" { "encoding" "<>", "sample rate" "<>", “bit rate” “<>” } }} media message this message encapsulates the audio packets { "event" "media", "sequence number" 3, "stream sid" "\<stream sid>", "media" { "chunk" 2, "timestamp" "10", "payload" "<>" } media chunk chunk of the message media timestamp timestamp in milliseconds from the start of the stream media in the payloads are sent in raw/slin (16 bit, 8khz, mono pcm (little endian)) encoded in base64 the same is expected from the client in the case of bi directional streams (in the case of a voice bot) to be played back to the human dtmf message dtmf message is sent when the digits are pressed by the user once the connection with websocket is established this is supported only for bidirectional streaming in voicebot applet { "event" "dtmf", "sequence number" 1, "stream sid" "\<stream sid>", "dtmf" { "duration" "\<duration in ms>”, "digit" <>, } } stop message stop message is sent when the stream is stopped or the call has ended { "event" "stop", "sequence number" 10, "stream sid" "\<stream sid>", "stop" { "call sid" “<>”, "account sid" "<>", “reason” “stopped or callended" } } mark message mark message is used only in bidirectional streaming to track media when it is completed {"event" "mark", "sequence number" 15, "stream sid" "\<stream sid>", "mark" { "name" "\<label>”, } } websocket messages to exotel these messages will be used only in bidirectional streaming mark message mark message is used only in bidirectional streaming to track media when it is completed you can send a mark event message after sending a media message to request a notification when the audio that you have sent has been processed you'll receive a mark event message with a matching name from exotel when the audio is processed {"event" "mark", "sequence number" 15, "stream sid" "\<stream sid>", "mark" { "name" "\<label>”, } } media message this message encapsulates the audio packets { "event" "media", "sequence number" 3, "stream sid" "\<stream sid>", "media" { "chunk" 2, "timestamp" "10", "payload" "<>" } clear message clear message is used to clear the audio data that was sent before but not yet played an example situation wherein it will be useful, developing human like bots which can guess what he/she is going to say and send audio accordingly even before he/she completes it when the guess goes wrong, we can clear that audio using a clear message { "event" "clear", "stream sid" "\<stream sid>", } note for clear event to work effectively, it is advisable to send media in smaller chunks for instance, if two media messages of 5 seconds are sent to us, followed by clear event at the 3rd second, and the first message has already been picked up and played, the clear event will only apply to the second media message therefore, sending media in smaller chunks can help prevent confusion and ensure that clear event works as intended media format media in the payloads are sent in raw/slin (16 bit, 8khz, mono pcm (little endian)) encoded in base64 the same is expected from the client in the case of bi directional streams to be played back to the caller event struct field reference table field name type json key optional? description / notes event string event no type of the event "start", "media", "stop", "dtmf", etc streamsid string stream sid yes unique identifier for the stream session sequencenumber string sequence number yes ordering number for incoming media chunks start start start yes present when the event is a start event media media media yes present when the event is a media event (contains audio data) stop stop stop yes present when the event is a stop event mark mark mark yes present when the event is a mark event dtmf dtmf dtmf yes present when the event is a dtmf (key press) event sample code https //github com/exotel/agent stream and https //github com/exotel/agent stream echobot simulator to make streaming calls with dummy bot https //github com/exotel/voice streaming/commit/d4696f3cb13fb5d75ac46bed2aaaa2afababa10f#diff 217ed82f87b78b399e304b07a159b27dff327c0f83adf4a2fc30b03bcbf84b01 chunk size window for bidirectional streaming use case minimum chunk size 3 2k \[100ms data] maximum chunk size 100k chunk size should always be in multiple of 320 bytes 1\ if the size is less than the minimum size, we may face audio issues due to network jitters 2\ if the size is greater than 100k, it might result in timeouts 3\ if the size x \[for ex 4096] which is not in multiple of 320 bytes in this case, the last packet will be of lesser size than 320 bytes, & platform will wait for 20ms before sending next chunk i e sending less amount of data than wait time, then this might result into audio gaps in between limitations 1\) when you start a unidirectional stream, the stream is forked immediately in case you have a connect applet post this, the audio stream, even when exotel dials out multiple agent,s would be relayed (ringing) the client will have to handle this to filter only relevant parts of the stream this limitation will be fixed in future releases 2\) the number of custom parameters that can be passed in the start message is 3 3\) the stream is sent as a mono channel raw audio format, and the client will have to handle speaker diarization if you have any questions or concerns, please connect with us using the chat widget on your exotel dashboard or whatsapp us on 08088919888

🤔

Have a question?

Our super-smart AI, knowledgeable support team and an awesome community will get you an answer in a flash.

To ask a question or participate in discussions, you'll need to authenticate first.