How Steve Jobs May Have Snookered the TV Industry

It is widely believed that when Steve Jobs told his biographer, Walter Isaacson, that he had “finally cracked it,” he was referring to a new way to operate TVs. That’s been interpreted as Siri for an Apple HDTV. That’s driven the TV manufacturers into a frenzy. Maybe a blind alley.

Mr. Jobs’s comment (Isaacson, p 830) is taking on the stature of Fermat’s Last Theorem. That is, everyone believes Mr. Jobs (and Fermat) had the solution, but no one could figure out what it was.

The TV industry, without vision, research or a demonstrated understanding of first principles seems to be jumping to the conclusion that voice input is the Holy Grail that Mr. Jobs was referring to and has hastily surmised that, to get a jump on the boogeyman of Apple’s own HDTV, that they should introduce voice input to their TVs. Or, in the style of Kinect, gestures. This is the conventional wisdom.

Home TV

Human Ergonomics

Before I can jump on that bandwagon myself, I tend to think in terms of history, the human experience of watching television, and why people have problems with modern TV systems. There are several parts to the problem.

Selecting Input

By far, the number one problem people have using modern HDTV systems, from my reading and chatting with other people, is the selection of the input source. Many people have several devices plugged into an AV receiver or just the TV: a Blu-ray player, an Apple TV or Roku box, and likely a cable/satellite box, in some cases a DVR. In order to watch the desired show, you must understand which input to use, then select the right remote (if not using a Logitech Harmony) and then pick the right button to cycle through (or select) the inputs until you estimate, from the visual appearance, that you have the right input source. Worse, the button may be cryptically labelled and hard to find. This is the process that drives non-technical people crazy, especially those who have not participated in the setup of the system and understand what’s going on.

Items of Interest

Steve Jobs came to understand that music customers aren’t interested in the Labels or even the albums they create. Music fans are interested in songs. Applying that understanding to the TV audience, it isn’t hard to understand that what people focus on is the show.

Studio   <=>   Label
Network  <=>   Album
Show     <=>   Song

What people hunt for when they’re in a TV watching mood isn’t the studio or the network. That might be a crutch, however. You know that Justified is on FX. FX is channel 248 on DIRECTV. So you back into that show by tuning to channel 248 at the appointed time. (Or set the DVR.)

Most people have come to the idea that, because it’s the show people are interested in, all they need to do is announced the show verbally, and the TV system will go find that show. But there are nuances. Which episode? The latest? Last week’s? The rerun of the season finale from last season? Distinguishing exactly what you want to watch leads to thoughtful articulation to a voice input system — a process that will still challenge both humans and computers.


That leads me to the final issue: thoughtful articulation. Just because you can apply a technology to a problem doesn’t man it’s the best solution. Accurate, speaker independent voice recognition is a fairly new technology, not thoroughly researched for human interactions. Yet, it’s being rushed out by TV makers.

Historically, TV operation has been “See and do.” In the ancient past, if you wanted to watch Star Trek, you’d remember that it’s on NBC and then tune to, say, channel 4. You can see that the dial is on channel 4. That concept has evolved over the years with cable, a plethora of channels and remotes, but it still depends on the idea of “see and do.”

Voice input requires one to “think and then articulate.” One must form the right thought, compatible with the abilities of the TV system, then enunciate the proper command. Or make the right gesture. My suspicion is that this can be tiresome and challenging, especially in a household with a lot of kids and yelling going on. Or background noise, like a vacuum cleaner. Or a sports bar.

At least, when you have physical possession of the remote, you’re in charge. I think this is just one of the human factors issues that needs to be addressed. It’s a major technological shift without the backing of extensive research. And we know how in this era of cost cutting and cut-throat TV competition, how much money is spent on human factors research.

The Way Forward

Solving the problem of selecting the right input is easy if you constrain the user to stay in the Apple HDTV realm. That is, if you can deliver everything the customer wants with no other inputs, then the input problem is solved. Lots of people are hungry to cut the cord. However, issues remain. Not many people will throw away their DVD collection — they’ll want tol keep their player. The last time I checked, Comcast will reduce your Internet speed if you cut Basic Cable (“bundling.”) So there still needs to be a way to select the desired input. Unless the Apple HDTV has zero additional HDMI inputs. Not likely.

Siri is designed for a device that’s small and has a small, virtual keyboard. Voice input makes sense. However, when you have a giant 60-inch screen, there are a lot of things you can do that you can’t do on an iPhone. Not to mention that the environment is different.  It may make more sense, to just walk up to the TV, touch the “window” that’s showing the desired input, and then swipe until you get to the show, and episode you want. Or do it on your iPad/iPhone — if you have one. Carefully constrained voice input may still be an option, not a requirement.

If Apple does an HDTV, I guarantee that it will be a thoughtful byproduct of all Apple technologies, an understanding of human nature and ergonomics, and be fun, not frustrating.

My point is that TV technology is fairly stupid while Macs and iOS devices are fairly smart. The challenge is to select the best technologies at hand that cover all the bases and make TV watching a delight. Jumping on the voice input or gesture technology alone, as a geeky replacement for the remote control, is like the ill-conceived rush into household 3D TVs. Just as 3D is now considered a feature but not a sea change in TV viewing, I also think voice input will be an ancillary feature, not the end-all, be-all solution that is being rushed out in a worrisome response to a cryptic comment by Steve Jobs before he passed away.

If Apple does an HDTV, it’s too important a change, and too big a challenge, to simply throw Siri at the problem and be done.