Recently I was optimizing some code to reliably play 4k video content at 60 fps through a 3D pipeline on low-end hardware. And it gave me a deja-vu of earlier situations with video. It seems that there is this endless cycle of new video formats turning up, followed by a period required for the hardware to catch up. It also reminded me that I had wanted to write a blog about some issues you run into when decoding video. So I think this is finally the time to dive into video codecs.
The endless cycle
At its basis, video playback on computer hardware is a very resource-intensive process. Worst-case you need to update all pixels in memory for every frame. So the performance depends on the number of pixels per frame (resolution), the colour-depth (bits per pixel), and the frame rate (number of frames per second).
If we want to get a bit retro here, convincing video playback on a consumer PC more or less started when hardware cards such as the Video Blaster arrived on the market. This was in 1992, before local bus was a widespread thing. The ISA bus was too slow for anything other than playing video in a really small window in the low 320×200 resolution at 256 colours.
The Creative Video Blaster circumvented this issue by having its own video output on board, and having video encoding/decoding hardware. It uses a Chips & Technologies F82C9001 chip, which supports YUV buffers in various compressed formats (2:1:1, 4:1:1 and 4:2:2), and it can also perform basic scaling. This meant that the CPU could send compressed video over the ISA bus, and it could be decoded on-the-fly on the Video Blaster board, at a relatively high resolution and colour depth. It’s difficult to find exact information on its capabilities, but it appears to be capable of PAL and NTSC resolution, and supports ‘over 2 million colours’, which would indicate 21-bit truecolour, so 7 bits per component. So I think we can say that it is more or less “broadcast quality” for the standards of the day: still in the era of Standard Definition (SD) PAL and NTSC.
The original webpage for the first Video Blaster (model CT6000) is archived here. It apparently requires an “IBM PC-AT and higher compatibles”, but the text also says it is for 386 PCs. So I suppose in theory it will work in a 286, but the software may require a 386 for best performance/compatibility.
Anyway, it should be clear that a 386 with an ISA VGA card could not play video anywhere near that well. You really needed that special hardware. To give an indication… a few years later, CD-ROM games became commonplace, and Full Motion Video (FMV) sequences became common. For example, see the game Need For Speed from 1994, which requires a fast 486 with localbus VGA:
The video quality is clearly not quite broadcast-level. The resolution is lower (320×240), and it also uses only 256 colours. The video runs at 15 fps. This was the best compromise at the time for the CPU and VGA capabilities, without any special hardware such as the Video Blaster.
From there on it was an endless cycle of the CPU and video cards slowly catching up to the current standard, and then new standards, with higher resolutions, more colours, better framerates and better compression would arrive, which again required special hardware to play back the video in realtime.
We moved from SD to HD, from interlaced video to progressive scan, from MPEG-1 to MPEG-2, MPEG-4 and beyond, and now we are at 4k resolution.
I would say that 4k at 60 fps is currently the ‘gold standard’: that is the highest commonly available content at the moment, and it currently requires either a reasonably high-end CPU and video card to play it back without any framedrops, or it requires custom hardware in the form of a Smart TV with built-in SoC and decoder, or a set-top media box with a similar SoC that is optimized for decoding 4k content.
Broadcast vs streaming
I have mentioned ‘broadcast quality’ briefly. I guess it is interesting to point out that in recent years, streaming has overtaken broadcasting. Namely, broadcast TV quality, especially in the analog era, was always far superior to digital video, especially on regular consumer PCs. But when the switch was made to HD quality broadcasting, an analog solution would require too much bandwidth, and would require very high-end and thus expensive receiver circuitry. So for HD quality, broadcasting switched to digital signals (somewhere in the late 90s to early 2000s, depending on which area). They started using MPEG-encoded data, very similar to what you’d find on a DVD, and would broadcast these compressed packets as digital data via ether, satellite or cable. The data was essentially packed digitally into existing analog video channels. The end-user would require a decoder that would decompress the signal into an actual stream of video frames.
At this point, there was little technical difference between playing video on your PC, and watching TV. The main difference was the delivery method: the broadcast solution could offer a lot of bandwidth to your doorstep, so the quality could be very high. Streams of 8 to 12 mbit of data for a single channel were no exception.
At the time, streaming video over the internet was possible, but most consumer internet connections were not remotely capable of these speeds, so video over the internet tended to be much lower quality than regular television. Also, the internet does not offer an actual ‘broadcasting’ method: video is delivered point-to-point. So if 1000 people are watching a 1 mbit video stream at the same time, the video provider will have to deliver 1000 mbit of data. This made high-quality video over the internet very costly.
But that problem was eventually solved, as on the one hand, internet bandwidth kept increasing and cost kept coming down, and on the other hand, newer video codecs would offer better compression, so less bandwidth was required for the same video quality.
This means that a few years ago, we reached the changeover-point where most broadcasters were still broadcasting at HD quality in 720p or 1080i quality, while streaming services such as YouTube or Netflix would offer 1080p or better quality. Today, various streaming services offer 4k UHD quality, while broadcasting is still mostly stuck at HD resolutions. So if you want that ‘gold standard’ of 4k 60 fps video, streaming services is where you’ll find it, rather than broadcasting services.
I really don’t want to spend too much time on the concept of interlacing, but I suppose I’ll have to at least mention it shortly.
As I already mentioned with digital HD broadcasting, bandwidth is a thing, also in the analog realm. The problem with early video is flicker. With film technology, the motion is recorded at 24 frames per second. But if it is displayed at 24 frames per second, the eye will see flickering when the frames are switched. So instead each frame is shown twice, effectively doubling the flicker frequency to 48 Hz, which is less obvious to the naked eye.
The CRT technology used for analog TV has a similar problem. You will want to refresh the screen at about 48 Hz to avoid flicker. So that would require sending an entire frame 48 times per second. If you want to have a reasonable resolution per frame, you will want about 400-500 scanlines in a frame. But the combination of 400-500 scanlines and 48 Hz would require a lot of bandwidth, and would require expensive receivers.
So instead, a trick was applied: each frame was split up in two ‘fields’. A field with the even scanlines, and a field with the odd scanlines. These could then be transmitted at the required refresh speed, which was 50 Hz for PAL and 60 Hz for NTSC. Every field would only require 200-250 scanlines, halving the required bandwidth.
Because the CRT has some afterglow after the ray has scanned a given area, the even field was still visible somewhat as the odd field was drawn. So the two fields would blend somewhat together, giving a visual quality nearly as good as a full 50/60Hz image at 400-500 lines.
Why is this relevant? Well, for a long time, broadcasting standards included interlacing. And as digital video solutions had to be compatible with analog equipment at the signal level, many early video codecs also supported interlaced modes. DVD for example is also an interlaced format, supporting either 480i for NTSC or 576i for PAL.
In fact, for HD video, the two common formats are 720p and 1080i. The interlacing works as simple form of data compression, which means that a 1920×1080 interlaced video stream can be transmitted with about the same bandwidth as a 1280×720 progressive one. 1080i became the most common format for HD broadcasts.
This did cause somewhat of a problem with PCs however. Aside from the early days of super VGA, PCs rarely made use of interlaced modes. And once display technology moved from CRT to LCD displays, interlacing actually became problematic. An LCD screen does not have the same ‘afterglow’ that a CRT has, so there’s no natural ‘deinterlacing’ effect that blends the screens together. You specifically need to perform some kind of digital filtering to deinterlace an image with acceptable quality.
While TVs also adopted LCD technology around the time of HD quality, they would always have a deinterlacer built-in, as they would commonly need to display interlaced content. For PC monitors, this was rare, so PC monitors generally did not have a deinterlacer on board. If you wanted to play back interlaced video on a PC, such as DVD video, the deinterlacing would have to be done in software, and a deinterlaced, progressive frame sent to the monitor.
This also means that streaming video platforms do not support interlacing, and when YouTube adopted HD video some years ago, they would only offer 720p and 1080p formats. With 1080p they effectively surpassed the common broadcast quality of HD, which was only 1080i.
Luckily we can finally put all this behind us now. There are no standardized interlaced broadcast formats for 4k, only progressive ones. Interlacing will soon be a thing of the past, together with all the headaches of deinterlacing the video properly.
So far, I have only mentioned broadcast and streaming digital video. For the sake of completeness I should also mention home video. Originally, in the late 70s, there was the videocassette recorder (VCR) that offered analog recording and playback for the consumer at home. This became a popular way of watching movies at home.
One of the earliest applications of digital video for consumers was an alternative for the VCR. Philips developed the CD-i, which could be fitted with a first-generation MPEG decoder module, allowing it to play CD-i digital video. This was a predecessor of the Video CD standard, which used the same MPEG standard, but was not finalized yet. CD-i machines could play both CD-i digital video and Video CD, but other Video CD players could not play the CD-i format.
This early MPEG format aimed to fit a full movie of about 80 minutes at a quality that was roughly equivalent to the common VHS format at the time, on a standard CD with about 700 MB of storage. This analog format did not deliver the full broadcast quality of PAL or NTSC. You had about 250 scanlines per frame, and the chrominance resolution was also rather limited, so effectively you had about 330 ‘pixels’ per scanline.
VideoCD aimed at a similar resolution, and the standard arrived at 352×240 for NTSC and 352×288 for PAL. It did not support any kind of interlacing, so it output progressive frames at 29.97 Hz for NTSC, and 25 Hz for PAL. So in terms of pixel resolution, it was roughly the equivalent of VHS. The framerate was only half that though, but still good enough for smooth motion (most movies were shot at 24 fps anyway).
VideoCD was an interesting pioneer technically, but it never reached the mainstream. Its successor, the DVD-Video, did however become the dominant home video format for many years. By using a disc with a much larger capacity, namely 4.7 GB, and an updated MPEG-2 video codec, the quality could now be bumped up to full broadcast quality PAL or NTSC. That is the full 720×576 resolution for PAL, at 50 Hz interlaced, or 720×480 resolution for NTSC at 60 Hz interlaced.
With the move from SD to HD, another new standard was required, as DVD was limited to SD. The Blu-ray standard won out eventually, which supports a wide range of resolutions and various codecs (which we will get into next), offering 720p and 1080i broadcast quality video playback at home. Later iterations of the standard would also support 4k. But Blu-ray was a bit late to the party. It never found the same popularity that VHS or DVD had, as people were moving towards streaming video services over the internet.
Untangling the confusion of video codec naming
In the early days of the MPEG standard (developed by the Moving Picture Experts Group), things were fairly straightforward. The MPEG-1 standard had a single video codec. The MPEG-2 standard had a single video codec. But with MPEG-4, things got more complicated. In more than one way. Firstly, the MPEG-4 standard introduced a container format that allowed you to use various codecs. This also meant that the MPEG-4 standard evolved over time, and new codecs were added. And secondly, there wasn’t a clear naming scheme for the codecs, so multiple names were used for the same codec, adding to the confusion.
A simple table of the various MPEG codecs should make things more clear:
|MPEG standard||Internal codename||Descriptive name|
|MPEG-4 Part 2|
|MPEG-4 Part 10|
|H.264||Advanced Video Coding (AVC)|
|MPEG-H Part 2|
|H.265||High Efficiency Video Coding (HEVC)|
|MPEG-I Part 3|
|H.266||Versatile Video Coding (VVC)|
What is missing? MPEG-3 was meant as a standard for HDTV, but it was never released, as in practice, the updates required were only minor, and could be rolled into an update of the MPEG-2 standard.
H.263 is also not entirely accurate. It was released in 1996. It is somewhat of a predecessor to MPEG-4, aimed mainly at low-bandwidth streaming. MPEG-4 decoders are backwards compatible with the H.263 standard, but the standard is more advanced than the original H.263 from 1996.
With MPEG-1 and MPEG-2, things were straightforward: there was one standard, one video codec, and one name. So nobody had to refer to the internal codename of the codec.
With MPEG-4, it started out like that as well. People could just refer to it as MPEG-4. But in 2004, another codec was added to the standard: the H.264/AVC codec. So now MPEG-4 could be either the legacy codec, or the new codec. The names of the standard were too confusing… “MPEG-4 Part 2” vs “MPEG-4 Part 10”. So instead people referred to the codec name. Some would call it by its codename of H.264, others would call it by the acronym of its descriptive name: AVC. So MPEG-4, H.264 and AVC were three terms that could all mean the same thing.
With H.265/HEVC, it was again not clear what the preferred name could be, so both H.265 and HEVC were used. What’s more, people would also still call it MPEG-4, even though strictly speaking it was part of the MPEG-H standard.
MPEG-I/H.266/VVC has not reached the mainstream yet, but I doubt that the naming will get any less complicated. The pattern will probably continue. And the MPEG-5 standard was also introduced in 2020 (with EVC and LCEVC codecs), which may make things even more confusing, once that hits the mainstream.
So if you don’t know that H.264 and AVC are equivalent, or H.265 and HEVC for that matter, it’s very confusing when one party uses one name to refer to the codec, and another party uses the other. Once you figured that out, it all clicks.
A special kind of confusion I have found is that it seems that it is often implied that you require special codecs for 4k video. But even MPEG-1 supports a maximum resolution of 4095×4095, and a maximum bandwidth of 100 mbit. So it is technically possible to encode 4k (3840×2160) content even in MPEG-1, at decent quality. In theory anyway. In practice, MPEG-1 has been out of use for so long that you may run into practical problems. A tool like Handbrake does not include support for MPEG-1 at all. It will let you encode 4k content in MPEG-2 however, which ironically it can store in an MPEG-4 container file. VLC actually lets you encode to MPEG-1 in 3820×2160 at 60 fps. You may find that not all video players will actually be able to play back such files, but there it is.
The confusion is probably because newer codecs require less bandwidth for the same level of quality. And if you move from HD resolution to 4k, you have 4 times as many pixels per frame, so roughly 4 times as much data to encode, resulting in roughly 4 times the bandwidth requirement for the same quality. So in practice, streaming video in 4k will generally be done with one of the latest codecs, in order to get the best balance between bandwidth usage and quality, for an optimal experience. Likewise, Blu-ray discs only have limited storage (50 GB being the most common), and were originally developed for HD. In order to fit 4k content on there, better compression is required.
But if you encode your own 4k content, you can choose any of the MPEG codecs. Depending on the hardware you want to target, it may pay off to not choose the latest codec, but the one that is best accelerated by your hardware. On some hardware, AVC may run better than HEVC.
Speaking of codecs, I have only mentioned MPEG so far, because it is the most common family of codecs. But there are various alternatives which also support 4k with acceptable performance on the right hardware. While MPEG is a widely supported standard, and the technology is quite mature and refined, there is at least one non-technical reason why other codecs may sometimes be preferred: MPEG is not free. A license is required for using MPEG. The license fee is usually paid by the manufacturer of a device. But with for example desktop computers this is not always the case. The licensing model also makes MPEG incompatible with certain open source licenses.
One common alternative suitable for 4k video is Google’s VP9 codec, released in 2013. It is similar in capabilities to HEVC. It is open and royalty-free, and it is used by YouTube, among others. As such it is widely supported by browsers and devices.
Another alternative is the Alliance of Open Media‘s Video 1 (AV1), released in 2018. It is also royalty-free, and its license is compatible with open source. This Alliance includes many large industry players, such as Apple, ARM, Intel, Samsung, NVIDIA, Huawei, Microsoft, Google and Netflix. So widespread support is more or less guaranteed. AV1 is a fairly new codec, which is more advanced than HEVC, so it delivers more compression at the same quality. The downside is that because it’s relatively new, and the compression is very advanced, it requires a quite powerful, advanced, modern CPU and GPU to play it back properly. So it is not that well-suited for older and more low-end devices.
In practice, you will have to experiment a bit with encoding for different codecs, at different resolutions, framerates and bitrates, to see which one is supported best, and under which conditions. I suppose the most important advice you should take away here is that you shouldn’t necessarily use the latest-and-greatest codecs for 4k content. There’s nothing wrong with using AVC, if that gives the best results on your hardware.
One last thing I would like to discuss is decoding video inside a (3D) rendering context. That is, you want to use the decoded video as a texture in your own rendering pipeline. In my experience, most video decoding frameworks can decode video with acceleration effectively, if you pass them a window handle, so they can display inside your application directly, and remain in control. However, if you want to capture the video frames into a graphics texture, there often is no standardized way.
The bruteforce way is to just decode each video frame into system memory, and then copy it into the texture yourself. For 1080p video you can generally get away with this appoach. However, for 4k video, each frame is 4 times as large, so copying the data takes 4 times as long. On most systems, the performance impact of this is simply too big, and the video cannot be played in realtime without dropping frames.
For Windows, there is the DirectX Video Acceleration framework (DXVA), which should allow you to use GPU-acceleration with both DirectShow and MediaFoundation. So far I have only been able to get the frames in GPU-memory in MediaFoundation. I can get access to the underlying DirectX 11 buffer, and then copy its contents to my texture (which supports my desired shader views) via the GPU. It’s not perfect, but it is close enough. 4k at 60 fps is doable in practice. It seems to be an unusal use-case, so I have not seen a whole lot in the way of documentation and example code for the exact things I like to do.
With VLC, there should be an interface to access the underlying GPU buffers in the upcoming 4.0 release. I am eagerly awaiting that release, and I will surely give this a try. MediaFoundation gives excellent performance with my current code, but access to codecs is rather limited, and it also does not support network streams very well. If VLC offers a way to keep the frames on the GPU, and I can get 4k at 60 fps working that way, it will be the best of both worlds.