Voice control, natural language, and digital assistants like Amazon Alexa and Google Assistant are huge right now, and thanks to the last decade of the consumerization of IT, we know we can’t ignore them. In fact, they’re probably already keeping some security people awake at night.
Sure, this conversation isn’t new, and you could say it started in 2011 when Siri came out. (Remember when IBM blocked it?) Gabe and I have addressed certain aspects (machine learning and natural language UIs). However, there was a lot of progress in 2017, and it’s time to give it more thought.
Defining the conversation
First, we have to break down the hype into different components so that we get on the same page. Here’s how I’ve been thinking about things:
- At a very basic level, we have voice recognition. This may use machine learning and other fancy stuff in the background, but for our purposes, it’s just simple transcription.
- Then there is natural language processing. Think of a command line interaction model, except certain commands can be substituted with common phrases. (Which substitutions work and which don’t? Often there’s some trial and error and you end up with some awkward commands.)
- There’s all the “artificial intelligence” stuff, which is the hardest to put in a bucket. The definition of “AI” is blurry (even though most of us know not to expect HAL 9000 any time soon), and so is the line between this and natural language processing. For now, I think of this as anything that can add more context to commands and queries. As I’ve written before, this is mostly an app business logic problem, not an IT admin problem. Still, we have a ways to go. Benedict Evans put it best on Twitter: “WHAT DO WE WANT? Natural language processing! WHEN DO WE WANT IT? Sorry, when do we want what?”
- We have know where the runtimes for all these components are, and what they can connect to. Technical folks like us can understand that while the microphone on our device is picking up our commands, the digital assistant can either be integrated into the OS or just an app; the processing can be in a cloud service; there can be integrations with both local and cloud apps; some of the integrated data sources will be personal and others will be public; and so on. To most people, though, it’s just magic or a mystery.
- There are plenty of ways to use the prior four components I just outlined. Digital assistants like Amazon Alexa, Google Assistant, Siri, and Cortana just happen to be the most prominent examples.
- Lastly, we need to break out different devices and contexts, i.e.. are you talking something built into your personal phone, or a kiosk, like Amazon Echo, Google Home, or Apple HomePod (finally). Are these things in your home? In a business? How about a in a hotel or a car or on a TV or fridge?
Voice and digital assistants in the enterprise
Now that we’ve broken things down a bit, let’s move onto some of the biggest issues for enterprise end user computing folks.
Let’s start with use cases. Many of us use digital assistants in our personal lives, and we all have tasks that we can do faster with Siri and Alexa than we can do by pawing through our devices. The business use cases around messaging and other simple tasks are obvious. (As are the etiquette implications—nobody wants to sit next to somebody talking to their devices all day.)
Today, all the kiosks like Amazon Echo and Google Home can help us imagine business scenarios with multiple users. I’m thinking meetings and conference calls, where being able to instantly look up sales numbers, business intelligence data, and even monitoring stats for IT folks would be tremendously helpful. In the extended enterprise, I’ve written about how wearables like Google Glass help with hands-free tasks, and voice UIs are naturally an important part of this, too.
These shared business use cases lead us into two more challenging problems: user authentication and business app integration.
How do you authenticate a user over voice? Speaking your username and password out loud is an obvious non-starter. (Alexa has a passcode-to-purchase feature, but that’s meant for home use, a more limited environment.) Fortunately, the identity management industry has been addressing biometric, multi-factor, and continuous authentication full steam.
Voice-based biometric authentication is clearly just beginning to enter the mainstream, and it’s not making any promises yet. Last year both Amazon Alexa and Google Assistant gained the ability to tell different users apart, and use that ability to control privileged information and actions. Of course these systems are still very easy to trick. To be fair, Google recognizes this possibility, but Amazon doesn’t seem to readily acknowledge it. (I’m sure I could find something limiting their liability in their terms of service, I just haven’t had the time to read through yet.)
We’ll just have to wait and see if any vendor starts making strong claims about their voice-based biometric authentication. In the meantime, one thing point out is that the feared replay attack, using a recording of a user’s voice à la Sneakers, could potentially be mitigated by a voice captcha. The UI could ask the user to repeat a random phrase. There are plenty of other was to authenticate, too. Users could respond to a notification on their phone or smartwatch, or the location of their phone could be matched to the location of the voice device. Lastly, we really are on the verge of a revolution in different types of authentication, including passwordless logins and continuous authentication, so this is all solvable—we just have to be aware of the issues and make sure the appropriate controls are in place.
A lot of this only matters for ambient devices like Amazon Echo and Google Home, since if we’re talking a mobile app, desktop app, or something built into a device, we have plenty of management and security tools for those already, including identity and access management, EMM, and mobile app reputation services.
When it comes to business app integration, we have to worry about issues like making sure the voice processing is done in a specific location, and ensure that any recordings are deleted or stay in regional and compliance-appropriate data centers. We also have to make sure that the right permissions and access controls are in place between the voice agent runtime and our enterprise apps and data. The good news is that these are completely doable and similar to issues that have been solved a million times before.
Adding voice and natural language user interfaces on top of existing enterprise apps is a problem that looks quite similar to adding a native mobile app client to existing enterprise apps. I wouldn’t be surprised to see the likes of Powwow, Capriza, other rapid mobile app development vendors, and MBaaS providers start working on this.
Like with the iOS and Android duality, there may be some aspects of voice and digital assistants that have a degree of platform lock-in. But again, in some places companies will build for multiple platforms (i.e. apps intended for BYOD or COPE devices), and in other places they’ll be able to settle on one (i.e. we’re going to put Amazon Echos in all our conference rooms, and integrate them to our BI and CRM systems).
Overall, there is definitely a possibility for serious security issues, and frankly I’m surprised the alarm hasn’t been as loud so far. On the other hand, I’ve argued that the initial BYOD wave taught us to expect the unexpected. In the meantime, even though I wasn’t thinking about voice and digital assistants when I made my 2018 outlook, they are something we should pay attention to.