AppleInsider discusses a recent Apple patent:

The August 2006 filing with the United States Patent and Trademark Office spans some 39 pages, but bears some of its most enticing revelations as part of a discussion involving ways of associating Dashboard-like widgets with various kinds of video content.

Apple says such widgets could be displayed or become available based on the content currently displayed in the Apple TV user interface. If the content is broadcasted, such as live television, then a widget could potentially be downloaded as part of the broadcast signal from a cable head-end, or provided through a separate communication such as an Internet connection, and then displayed over the content.

The widget can also be downloaded at a different time than the broadcast signal and invoked by information contained in the broadcast signal, the company explained. Similarly, it could be triggered by some event , such as user interaction with a user interface element, or come included with the content on a storage medium such as a DVD.

Sounds good. But when I see this picture:

What I hope it means is that the information depicted inset in the image – the game score – is provided as separate data. Rather than, say, “printed” into the video stream. This ought to be the evolution of the TV concept: keep video and images as video and images, but keep data as data. I imagine that the ESPN web site is probably better for this specific purpose, but it’s not exactly TV/couch friendly. It’s probably pretty easy to do as a web page with JavaScript today, at least if you’re coding directly to WebKit.

Well, it’s good news, but this is only the first baby step in the new world of video. Consider this:

Dumb video

Things like this come to your TV as a video stream, and your eyeball-brain system decodes all the various info. You’re probably thinking darn, if only I had HD I could read all that. But no, you don’t need more pixels, you need more data. It is downright dumb in this day and age that this kind of presentation comes as a sequence of frames. It’s dumb because TVs are dumb: they can only play a sequence of frames.

What this should be – no, what this will be – is that each element in that screen will come as a separate data stream. The guy on the left will come as a separate stream, the guy on the right will come as a separate stream, the channel logo, the title, the time, the stream of random information along the bottom. The user agent – be it a TV, your computer, iPhone, etc., will combine them according to the script (like, with SMIL) or your user preferences to show it to you.

Oh My that sounds awesome. Except that I just kind of described the web. Kind of. The web 2.0 for video, perhaps, rather than video on the web. It’s inevitable.

Why? you ask. Well, one important theme for the early 21st century IMHO is the increasing complexity of the end-user media devices. Added complexity brings added functionality. When we got CD players, we were able to navigate tracks. When we got Mp3 players, we got ID3 tags. What’s next? use your imagination. Audio is moving along nicely, while video is still stuck in the CD era. But we did get DVD players where we could “choose tracks” (title & scene selection) as for audio on CDs. And we have YouTube with its tagged videos.

It’s my view that for audio, the next step is breaking down the tracks into their components, ie doing the final mix in the player. The above example for video is the equivalent of that : bringing the final video compositing responsibility to the player.

Again, why? The player can then make decisions about its content. Think DRM; think personalization; think cross-selling. Think camera angles and alternate streams. Think language translation, content focus, user zooming… And if you want to watch all seven camera angles of the baseball game side by side instead of having the producer or editor decide how to cut between them for you, why not?