video streams and metadata

AppleInsider discusses a recent Apple patent:

The August 2006 filing with the United States Patent and Trademark Office spans some 39 pages, but bears some of its most enticing revelations as part of a discussion involving ways of associating Dashboard-like widgets with various kinds of video content.

Apple says such widgets could be displayed or become available based on the content currently displayed in the Apple TV user interface. If the content is broadcasted, such as live television, then a widget could potentially be downloaded as part of the broadcast signal from a cable head-end, or provided through a separate communication such as an Internet connection, and then displayed over the content.

The widget can also be downloaded at a different time than the broadcast signal and invoked by information contained in the broadcast signal, the company explained. Similarly, it could be triggered by some event , such as user interaction with a user interface element, or come included with the content on a storage medium such as a DVD.

Sounds good. But when I see this picture:

What I hope it means is that the information depicted inset in the image – the game score – is provided as separate data. Rather than, say, “printed” into the video stream. This ought to be the evolution of the TV concept: keep video and images as video and images, but keep data as data. I imagine that the ESPN web site is probably better for this specific purpose, but it’s not exactly TV/couch friendly. It’s probably pretty easy to do as a web page with JavaScript today, at least if you’re coding directly to WebKit.

Well, it’s good news, but this is only the first baby step in the new world of video. Consider this:

Dumb video

Things like this come to your TV as a video stream, and your eyeball-brain system decodes all the various info. You’re probably thinking darn, if only I had HD I could read all that. But no, you don’t need more pixels, you need more data. It is downright dumb in this day and age that this kind of presentation comes as a sequence of frames. It’s dumb because TVs are dumb: they can only play a sequence of frames.

What this should be – no, what this will be – is that each element in that screen will come as a separate data stream. The guy on the left will come as a separate stream, the guy on the right will come as a separate stream, the channel logo, the title, the time, the stream of random information along the bottom. The user agent – be it a TV, your computer, iPhone, etc., will combine them according to the script (like, with SMIL) or your user preferences to show it to you.

Oh My that sounds awesome. Except that I just kind of described the web. Kind of. The web 2.0 for video, perhaps, rather than video on the web. It’s inevitable.

Why? you ask. Well, one important theme for the early 21st century IMHO is the increasing complexity of the end-user media devices. Added complexity brings added functionality. When we got CD players, we were able to navigate tracks. When we got Mp3 players, we got ID3 tags. What’s next? use your imagination. Audio is moving along nicely, while video is still stuck in the CD era. But we did get DVD players where we could “choose tracks” (title & scene selection) as for audio on CDs. And we have YouTube with its tagged videos.

It’s my view that for audio, the next step is breaking down the tracks into their components, ie doing the final mix in the player. The above example for video is the equivalent of that : bringing the final video compositing responsibility to the player.

Again, why? The player can then make decisions about its content. Think DRM; think personalization; think cross-selling. Think camera angles and alternate streams. Think language translation, content focus, user zooming… And if you want to watch all seven camera angles of the baseball game side by side instead of having the producer or editor decide how to cut between them for you, why not?

version control

git seems pretty awesome. I haven’t participated in a large team version control system since the 90s though; personally I have just been using rcs directly when I need file-level version control.

But on this topic in general, I have long thought that maintaining a per-object revision history should be a fundamental, automatic, highly granular, low-level feature of.. well, I guess you could say anything you do with digital data. But the problem is that most vc systems deal with files rather than the concepts contained in the file. For software we don’t do revision control of functions or classes, for instance, but rather their source text in ASCII or UTF-8. You can make all kinds of changes to the source text that don’t affect program functionality, but when only operating on the source text, a revison control system cannot tell the difference. This is especially a problem when the concepts migrate among individual files over time. Personally I want to keep track of the objects more than the files.

I frequently find myself wishing that the changes I had made to an individual object, whether it is a function in software or layer in a Photoshop document, could be rolled back independent of the state of everything else. Usually the “undo history” is a document-centric sequence of edits, just as the revision history would be for a text file of source code.

My only experience with highly granular automatic revision control was back in about 1995 when I came across a SunOS file system bug. I was going insane finding recently edited source files zero’d out on block boundaries. I didn’t realize it was a bug until I got in touch with a Sun engineer. Until then I cranked up the regular backups, but in addition, I wrote a small program to frequently poll the stat() modification time of a list of files, and upon finding a change, make a numbered backup copy and automatically check them into rcs, thereby crowding the file system with copies of my files and minimizing the chance of actual data loss. I also thought generally it would be interesting to have the micro-level change history, but when it became thousands of revisions per file, since I habitually save about every few seconds, it was overwhelming. To make automatic revision control work, there has to be some way to decide what is a reasonable change size. I would imagine that over time, the system would have to scan and group versions that are similar. For instance, if something changed ten times in ten seconds, with nothing close before or after, it might be reasonable to group or combine them in some way to unclutter the version history.

This would be a much more intelligent process if the vc system had a structural understanding of the information; multiple sequential changes to a comment could be grouped, whereas a change to a comment followed by the addition of an argument to a function definition probably might not.

So for an inside out system, vc functionality would be a fundamental feature of the system, and information like comments, author, and other metadata would not be part considered part of the code, because they’re simply non-functional. They are attached to the code, related to the code in some way; modifications to a note about the code doesn’t constitute a change to the functional nature of the software.

My (anti-) AI argument

So here’s my view on AI in brief.

  1. Natural systems are more robust, durable, adaptive and complex than anything humans build. Nothing we do, in fact, comes even remotely close.
  2. It is on this foundation that our human intelligence rests, or appeared in the evolutionary tree.
  3. Can we short-circuit that? I doubt it.

Extraordinary things exist in nature – relative to what we humans can build. Without intelligence. It is my feeling then that these features of the “natural platform” are much more important than trying to create AI. Not only would software (let’s say) which exhibited the features of natural life be more useful in practical reality than AI, it is my feeling that it is a prerequisite. The creation of complex structure through the process of growth, generational adaptivity, speciation… these are the sorts of things we need to create robust, large scale software systems that can change over time. This is the sort of platform we need before we can hope, IMHO, to create something resembling intelligence.

Unfortunately, that begins to resemble life. Am I saying we need to create life in order to achieve AI? I might be. That might be the logical problem of AI in a nutshell. At least AI of the “artificial human” variety.

blog