WWDC 2013tGuy English: “There’s only one CPU socket and it bets heavily on the bus and GPU performance. While this looks to software to be just another Mac, it isn’t. It’s capabilities aren’t traditional. The CPU is a front end to a couple of very capable massively parallel processors at the end of a relatively fast bus. One of those GPUs isn’t even hooked up to do graphics. I think that’s a serious tell. If you leverage your massively parallel GPU to run a computation that runs even one second and in that time you can’t update your screen, that’s a problem. Have one GPU dedicated to rendering and a second available for serious computation and you’ve got an architecture that’ll feel incredible to work with.”

At my day job I work on an SDK that allows people to embed video in their applications. The SDK lives on an awesome framework developed by our Systems team that is portable and allows us to create plugins that can process media and push it down the pipeline. That pipeline includes plugins to receive data from the network, decode that data, time it, and render it to a portion of a display. It can do this for live and recorded video, MPEG4, H.264, and even low frame rate JPEG video (so we don’t have to decode frames on the client.) But, I digress. If you notice, I mentioned decoding. We’ve looked at decoding with hardware but it’s actually quite expensive to push encoded frames across the bus, decode them, push them back across the bus, and finally render it, which pushes it back across the bus. Ick.

At one time Pelco had built its own combo card that could decode video and render to the display with a single push across the bus. That was a cool piece of hardware. At the time we could decode and display sixteen separate video streams simultaneously, at varying frame rates. That card was extremely underpowered. I guess what I’m getting at is this: How cool would it be to leverage one GPU on a Mac Pro for decoding all video, be it one stream or sixteen, and push the results across to the secondary GPU for rendering, without a transfer back across the bus to main memory? The idea of it seems very exciting.

Now all we need to do is build our pipeline for Mac OSX(totally doable) and create a new decode/render plugin that takes advantage of the new GPU. I’m not sure if its totally possible, without multiple bus transfers, but it would be fun to try.