The question I’ve been asked the most in the last couple of weeks by reporters as we get ready for Apple’s Developer Conference is whether Apple’s Siri personal assistant is behind its competitors, and whether it can catch up. The answer is more complicated than just a yes or a no, and having some context for next week’s announcements is really important to evaluating them properly when the time comes.
Comparing Apples with Apples
First off, many of the comparisons I’m seeing made at the moment are comparing apples with oranges (no pun intended). What I mean by that is that we’re at a particular point in the calendar where we’re comparing everyone else’s products (and in many cases announcements of products that aren’t yet available) from 2016 to Apple’s versions from 2015. Since Apple only makes major changes to Siri and its software in general once a year, we’re still looking at last year’s versions ahead of WWDC next week, but all the other major consumer tech companies have already held their developer events this year, and thus shown their hands.
Adding more to the “unfairness” of the comparison, in many cases competing products aren’t actually available yet, and won’t be for months. So evaluating Apple’s position in digital assistants (and artificial intelligence more broadly) today makes a lot less sense than it will this time next week, when we know what Siri will look like in the second half of 2016. If past patterns continue, it seems likely that at least some of the new features will be available to developers almost immediately, to participants in Apple’s iOS beta program shortly after, and to everyone with an iPhone in September.
Three components to digital assistants
Even though people talk about voice-based assistants in a unitary fashion, there are really three main components to these products, and if you really want to evaluate an individual example, you have break it into these constituent parts. Those three parts are:
- Voice recognition – turning sounds into individual words
- Natural language processing – turning collections of words into phrases and sentences with specific meanings
- Serving up responses from a cloud service.
An effective digital assistant needs to be good at all three of these things in order to do the overall job well. First, it has to recognize the words accurately, then it has to properly identify the meaning of the set of words the user says, and then it has to serve up a response based on the set of things it’s capable of doing.
Siri is competitive but not a leader today
For today, Apple’s Siri is decent but not stellar on the first two points. In both cases, Google’s voice search and Amazon’s Echo device do a better job of both recognizing individual words and ascribing meaning to phrases and sentences. The gap isn’t huge, but it’s noticeable, while Microsoft’s Cortana generally performs roughly on par with Siri in my experience. On the third point, Siri has expanded the range of tasks it can perform, but it’s still limited mostly to things Apple’s services can perform with a handful of third party services feeding in data on particular topics. Google’s voice search can pull in a little more third party data and has a much wider range of first party data to pull from, while Amazon’s Alexa assistant has an open API that’s resulted in connections to many third party services, though a large number are from tiny companies you’ve never heard of. Cortana is, again, roughly on par with Siri here.
On balance, then, Siri is roughly in the same ballpark as competitors, but lags slightly behind in all three of the key areas versus both Google and Amazon. Though it’s not available yet, Google has also announced the next generation of its digital assistant, called simply “the Google assistant”, which will be able to respond to text as well as voice queries and engage in conversations with users. This is capability Cortana has already, but others including Siri don’t. It should roll out over the next few months to users, but it’s hard to evaluate how effective it will be based on keynote demos alone.
Where Apple might go next week
Returning to my first point, as of right now we’re comparing the 2016 versions of others’ products to the 2015 versions of Apple’s, so the question becomes how Apple might move Siri forward at WWDC and close the gap in these various areas. Across those three areas, the most likely changes are:
- Voice recognition – Apple has been continually improving its voice recognition, and although we’ve seen the least concrete rumors ahead of time in this area, I would expect it to talk up further improvements at WWDC
- Natural language processing – Apple has acquired a variety of companies with expertise in artificial intelligence recently, and among them is VocalIQ, which specializes in conversational voice interactions. I would expect significant improvements in natural language processing including multi-step conversations to be announced at WWDC, which should move Apple forward in a big way in this area
- Responses – the biggest thing holding Siri back right now is its lack of third party integrations, and especially the inability for developers to make functionality in their apps available to Siri. Were that to change at WWDC – which it seems likely to do with a Siri API – that would again dramatically improve the utility of Apple’s voice assistant.
I’ve so far focused mostly on the voice aspects of these digital assistants, but Apple also added other elements last year, and might continue to build on them this year. In 2015, it introduced the Proactive elements of Siri, which serve up contacts, apps, news, and other content proactively through notifications and in the Spotlight pane in iOS. The main area I’d like to see it adding more functionality this year is in text interactions with Siri, which could potentially happen either in the standard Siri interface or through iMessage, such that Siri would appear as just another contact you could exchange text messages with. Apple could even open up iMessage as a platform for bots and conversational user interfaces from third parties, which would help Apple keep pace with announcements from Facebook, Microsoft, and Google in this area.
The other thing that’s worth bearing in mind is that these digital assistants are only useful when they’re available. Amazon’s Echo device does very well where it’s present, but its biggest weakness is that Amazon has only sold around three million devices, and its Alexa assistant isn’t available on phones, the devices we carry everywhere with us. Google’s assistant is pervasive, available both on Android and iPhone, on the web, and elsewhere, while Siri is available on most of Apple’s devices in some form (with the exception of the Mac). Cortana is available on PCs running recent versions of Windows, but its availability on phones does little to help since there are so few Windows phones in use. If Apple extends Siri to the Mac at this year’s WWDC, another credible rumor, then it will make it even more ubiquitous in the lives of those committed to the Apple ecosystem.
Changing the narrative
What Apple’s faced with as it heads into WWDC is a growing narrative which suggests it’s falling behind in both AI generally and the realm of digital assistants specifically. As I’ve already said, given the quirks of the calendar, Apple has naturally been silent as others have revealed their 2016 plans, and so this comparison is partly unfair. But Apple has a chance during its developer conference to demonstrate that it’s committed to not just keeping up but establishing leadership in these areas. By Monday afternoon, we’ll be in a much better position to judge whether it’s been successful in changing the narrative.