Intelligent personal assistants interact with users through voice interactions and a voice user interface (UI). Voice UI’s, traditionally integrated into hardware devices such as smartphones and tablets like Apple’s “Siri” and Android’s “OK Google”, have now been integrated into a standalone speaker device such as the Amazon Echo or Google Home.
Digital personal assistants allow hands-free interaction through simplistic and convenient voice control. Users are able to carry out a number of actions that would typically involve turning to a smartphone or tablet to complete; finding weather updates, playing Spotify music, finding news headlines, ordering an Uber taxi or a takeaway, to name a few.
Amazon’s Alexa has been in the limelight with early adopters and exponentially growing sales. Just two years after its release, it has established a solid position in the market. Alexa can be accessed from several devices, the most reputable and accessible being the Amazon Echo and Amazon Echo Dot speakers. Once set up and connected to your smartphone or tablet, Alexa provides a set of built-in voice-driven applications, referred to as ‘skills’.
Apart from Amazon’s reputation for aggressive marketing strategies, the demand for Alexa in the market is perhaps driven by the widely accessible open source development kit which enables a skill to be built. Amazon has created a publicly available Alexa Skills Kit, which enables anyone to build a custom skill, aided by Amazon’s custom API kits, web services and hosting solutions. Amazon’s Alexa now has more than 10,000 skills created by third parties available in the Amazon skills store, compared to only 135 from Q1 2016 (statista.com).
The most common skills are Custom Skills (or custom interaction models), which handle most types of requests, such as fetching information from a web service to get the weather update, or interacting with a web service to order a pizza. A Custom Skill is made up of a set of intents, a set of utterances, an invocation name, a cloud based service and a configuration.
When creating a skill, the Alexa Skills Kit offers self-service APIs, tools and documentation to support building different types of skills, depending on your requirements. The first step in creating a skill is setting up an intent schema, which is a set of intents that represents actions that you can do with your skill, and these are the core functionality of the skill. From here, you create a set of utterances that specifies the words and phrases a user can say to invoke those intents, prompt an action and map them together. These intents and utterances define how a user will interact with the skill. To launch the skill on a user’s device, you need an invocation name that will prompt the skill to launch and initiate a conversation. The final component is a Home Card, which is the companion to the skill which a user accesses from their smartphone, tablet or computer. A Home Card can be a simple explanation of the skill, or act to enhance the voice interaction, for example displaying a text / image answer to the user on their device. The Home Card sits within the Alexa app or website on the user’s device.
The intents, utterances, invocation name and Home Card represent the front-end user interface of the skill and what users will engage with through their Amazon Echo or Dot speaker. In order to ensure that the skill is interacting correctly with the user, a cloud-based service that accepts these intents and acts upon them is required, for which Amazon recommends building an AWS Lambda function.
These are the fundamental components that make up all skills, and are what make up around 80% of skills on Amazon’s store. There are also a number of Smart Home skills which enable users to interact with smart home cloud based devices such as smart lighting control, heating and hot water. The Smart Home Skill API replaces intents with what we call devices directives, which are the action requests the smart devices can accept, which prompts the interaction with the device.
The rise of Alexa skills are partly due to the ease of creation that is associated with them. Amazon famously says that a basic skill can be created in as little as 30 minutes. Additionally, due to the current buzz around Voice UI, businesses are prompted to build skills in order to be seen as innovative and in touch with new digital products.
There is undoubtedly a growing market for Voice UI devices, and the skills and interactions that are associated with them. Amazon have sold over 5 million Echo’s in just 2 years, and Google, hot on their heels, have entered the competition with the Google Home. This uptake represents an opportunity for businesses to establish themselves in this space and ride the wave of PR and advertising around the rise of personal digital assistants.