Why the buzz about H.264?
It's the bitrate!H.264
is getting so much attention because it can encode video with approximately
3 times
Goals
& Approach of H.264
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| • | Fine-grained motion estimation. Temporal search seeks matching sub-macro blocks of variable size as small as 4x4, and finds the motion vector to _ pel resolution. Searches may also identify motion vectors associated with matching sub-macro blocks of 4x8, 8x4, 8x8, 8x16, 16x8, or the full 16x16. [In future, even finer 1/8 pel resolution will be supported.] |
| • | Multiple reference frames. H.264 provides additional flexibility for frames to point to more than multiple frames – which may be any combination of past and future frames. This capability provides opportunities for more precise inter-prediction, but also improved robustness to lost picture data. |
| • | Unrestricted motion search. Motion search allows for reference frames that may be partly outside the picture; missing data can be spatially predicted from boundary data. Users may choose to disable this feature by specifying a Restricted Motion search. |
| • | Motion vector prediction. Where sufficient temporal correlation exists, motion vectors may be accurately predicted and only their residuals transmitted explicitly in the bitstream. |
H.264 performs intra prediction in the spatial domain (prior to the transform, and it is a key part of the approach. Even for an intra-picture, every block of data is predicted from its neighbors before being transformed and coefficients generated for inclusion in the bitstream.
• |
Coarse versus fine intra prediction. Intra prediction may be performed either on 4x4 blocks, or 16x16 macro blocks. The latter is more efficient for uniform areas of a picture. |
• |
Direction Dependent Intra Modes. By doing intra prediction in the spatial domain (rather than in the transform domain), H.264 can employ prediction that is direction dependent, and thus can focus on the most highly correlated neighbors. For Intra 16x16 coding and Intra 4 x 4 coding, there are 9 and 4 directional modes, respectively. |
• |
4x4 transform of Residual Data. For initially supported profiles, residual data transforms are always performed for 4x4 blocks of data, and coefficients transmitted on this fine-grained basis. |
• |
Variable block sizes for spatial transform*. Future profiles will allow transform of variable size blocks (4x8, 8x8, etc.) with the same level of flexibility as motion estimation blocks. This will provide more flexibility and further reduction of bitrate. |
• |
Integer transforms. Efficiency in both computation and bitrate is gained by implementing the traditional Discrete Cosine Transform (DCT) as an integer transform that requires no multiplications, except for a single normalization. It can also be inverted exactly without mismatch. |
• |
Deblocking filter. To eliminate fine structure blockiness that might be aggravated by the smaller transform blocks, a context-sensitive deblocking filter smoothes out the internal edges. Its filter strength depends upon the prediction modes and relationship between the neighboring blocks. In addition to increasing signal-to-noise ratio (S/N), this technique significantly improves the subjective quality of the image for a given S/N. |
| • | Context-Adaptive Variable Length Coding (CAVLC) employs multiple variable length codeword tables to encode transform coefficients, which consume the bulk of bandwidth. Based upon a priori statistics of already processed data, the best table is selected adaptively. For non-coefficient data, a simpler scheme is used that relies upon only a single table. |
| • | Context-Adaptive Binary Arithmetic Coding (CABAC*) provides an extremely efficient encoding scheme when it is known that certain symbols are much more likely than others. Such dominant symbols may be encoded with extremely small bit/symbol ratios. The CABAC method continually updates frequency statistics of the incoming data and adaptively adjusts the algorithm in real-time. This method is an advanced option available in profiles beyond the baseline profile. |
| • | Slice coding. Each picture is subdivided into one or more slices. The slice is given increased importance in H.264 as the basic spatial segment that is independent from its neighbors. Thus, errors or missing data from one slice cannot propagate to any other slice within the picture. This also increases flexibility to extend picture types (I, P, B) down to the level of "slice types." Redundant slices are permitted. |
| • | Data partitioning is supported to allow higher priority data (e.g., sequence headers) to be separated from lower priority data (e.g., B-picture transform coefficients). |
| • | Flexible macro block ordering (FMO) can be used to scatter the bits associated with adjoining macro blocks more randomly throughout the bit stream. This reduces the chance that a packet loss will affect a large region and enables error concealment by ensuring that neighboring macro blocks will be available for prediction of a missing macro block. |
| • | The Multiple Reference Frames that are used for improved motion estimation also allow for partial motion compensation for a P picture when one of its referenced frames is missing or corrupted. |
| • | Baseline Profile. A basic goal of H.264 was to provide a royalty-free baseline profile to encourage early application of the standard. The baseline profile consists most of the major features described above, with the exception of: B slices and weighted prediction; CABAC encoding; field coding; and SP & SI slices. Thus, the baseline profile is appropriate for many progressive scan applications such as video conferencing and video-over-IP, but not for interlaced television or multiple stream applications. |
| • | Main Profile. Main profile contains all of the features in Baseline, except flexible macro block ordering (FMO), arbitrary slice order (ASO) and redundant slices. However, it adds field coding, B slices and weighted prediction, and CABAC entropy coding. This profile is appropriate for efficient coding of interlaced television applications where bit or packet error is not excessive, and where low latency is not a requirement. |
| • | Extended Profile. This profile contains all features from the baseline profile and main profiles, except that CABAC is not supported. In addition, the Extended profile adds SP and SI for stream switching, and up to 8 slice groups. This profile is appropriate for server-based streaming applications where bit-rate scalability and error rate is very important. Mobile video services would be an example. |
![]()
Where will H.264 have the biggest impact?
Any video application can benefit from a reduction in bandwidth
requirements, but highest impact will involve applications where
such reduction relieves a hard technical constraint, or which makes
more cost-effective use of bandwidth as a limiting resource.
In addition, other H.264 features such error containment, error
concealment, and efficient bitstream switching is especially useful
for IP and wireless environments.
Squeeze More Services into a Broadcast Channel
Reduction in bandwidth requirements by factors of 2-3 provide cost
savings for bandwidth-constrained services such as satellite and
DVB-Terrestrial, or alternatively allow such providers to expand
services at reduced incremental cost.
Facilitate High Quality Video Streaming over IP Networks
H.264 can produce very good quality, TV Quality streaming at less
than 1Mbps (standard definition). This slips under 1 Mbps thresholds
for xDSL and thus opens possibilities for new access methods for
high quality, larger format video.
![]()
High Definition Transmission and Storage
Recall that MPEG-2 consumes 15-20 Mbps for High
Definition video at suitable quality for broadcast or DVD. Use of H.264
will bring this down to about 8 Mbps, making it possible for bandwidth-strapped
satellite service providers to fit 4 HD channels per QPSK channel.
Even more significant is that this reduction enables burning one HD
movie onto a conventional DVD, thus avoiding the need for the industry
to adapt a higher density ("blue laser") DVD format.
Mobile Video Applications
3G Mobile networks present an unusual array of
technical challenges that have driven many features in H.264. Applications
include video conferencing, streaming video on demand, multimedia-messaging
services, and low resolution broadcast. Some key issues, and H.264 tools
for dealing with them, include:
| • | Low bandwidth (50 – 300 kbps) is the key issue. The expected trend is for 3G deployment to start with h.263 and move up to H.264 as it matures. An industry analyst points out "… 3G networks are only likely to offer 57.6kbit/s initially. As those bit rates increase, mobiles and networks will move to the new H.264 codec, which offers twice the performance of H.263. This should result in the same picture quality being achieved at half the bit rate." |
| • | Small devices with many formats; variability of available bandwidth. For streaming applications, these two separate issues can be addressed by providing multiple streams with different formats and bandwidths, and selecting the appropriate stream at run-time. H.264's SP and SI pictures facilitate dynamic switching among multiple streams to accommodate bandwidth variability. |
| • | High bit error rates, packet losses, and latency. For video applications, retransmissions are impractical for dropped or delayed packets, so H.264 provides several means (e.g., FMO, data partitioning, etc.) to contain error impacts and facilitate error concealment. |
What
is the relationship to MPEG-4 and MPEG-2?
Compared to MPEG-2
H.264 employs the same general approach as MPEG 1 & 2 as well as
the h.261 and h.263 standards, but adds many incremental improvements
to obtain coding efficiency improvement of about a factor-of-3.
MPEG-2 was optimized with specific focus on Standard and High Definition
digital television services, which are delivered via circuit-switched
head-end networks to dedicated satellite uplinks, cable infrastructure
or terrestrial facilities. MPEG2's ability to cope is being strained
as the range of delivery media expands to include heterogeneous mobile
networks, packet-switched IP networks, and multiple storage formats,
and as the variety of services grows to include multimedia messaging,
increased use of HDTV, and others. Thus, a second goal for H.264 was
to accommodate a wider variety of bandwidth requirements, picture formats,
and unfriendly network environments that throw high jitter, packet loss,
and bandwidth instability into the mix.
Compared to MPEG-4
During 2002, the H.264 Video Coding Experts Group
combined forces with MPEG4 experts to form the Joint Video Team (JVT),
so H.264 is being published as MPEG-4 Part 10 (Advanced Video Coding)
and will in essence become part of future releases of MPEG-4.
MPEG-4 is really a family of standards whose overall theme is object-oriented
multimedia applications. It thus has much broader scope than H.264,
which is strictly focused on more efficient and robust video coding.
The comparable part of MPEG-4 is Part 2 Visual (sometimes called "Natural
Video"). Other parts of MPEG address scene composition, object
description and java representation of behavior, animation of human
body and facial movements, audio and systems.
Let's Compare Some Results
Numerous comparisons between H.264 performance
and other standards can be found at the end of general articles, or
within the standards group. Such comparisons are frequently based upon
Signal-to-Noise ratios or upon subjective comparisons of the video clip
or of individual frames. We include a few frames from our own comparisons,
as well as some articles presenting results independent from the Joint
Video Team.
Static Comparisons
Of course, if our hardware and networks provided
an infinite bitrate, efficient video compression would not be such an
important issue, and the advanced compression methods of H.264 would
be unnecessary. However, the bitrate is limited in the real world of
broadcasting, DVD, and mobile video, and advanced standards such as
H.264 provided greatly improve quality at any given bitrate.
Figures 1 & 2 provide some static comparisons of an individual frame
from the popular Foreman 176x144 (QCIF) clip. Later, we will tell you
how to download the short (100 frame) clips corresponding to these cases
in a form that can be played back on any popular media player and compared.
![]() |
![]() |
| Figure 1A shows a picture encoded in MPEG2 at 2400 kbps (kilobits per second) – in essence an infinite number of bits for such a small format. MPEG2 encoding at this bitrate is essentially lossless, so the resulting quality closely matches that of the original source data. | In Figure 1B, the bitrate has been reduced to 400 kbps, and you can begin to see some fuzziness in the frame as the quantization has been made coarser to drop the bit rate; when you view the corresponding video clip, you will also see some smearing of fast panning motions. |
![]() |
| When the MPEG2 bitrate is further dropped to 100 kpbs in Figure 1C, things begin to fall apart. You begin to see blockiness at the macro block level as some macro blocks can only be resolved as uniform (DC) values, and any fast motion is distorted. |
The value of H.264 is most obvious at low bit rates. In figure 2, you
can see the difference of encoding via MPEG-2 and H.264 at 100 kpbs.
The tremendous improvement in quality produced by H.264 is self-explanatory.
![]() |
|
| Figure
2A. MPEG-2 |
Figure
2B. H.264 |
Another way to look at this is to compare the
bit rates needed by MPEG-2 and H.264, for similar quality images. In
our judgment, figure 1b (MPEG-2 at 400 kpbs) and Figure 2b (H.264 at
100 kbps) show very similar quality. While comparing quality is very
subjective, this is consistent with PixelTools evaluations of many tests
– generally showing a 3-4-fold decrease in bit rate from MPEG-2
for the same level of quality.
See for yourself
H.264 is so new that few free decoding tools are easily available, so
making evaluations can be awkward. For example, you can decode with
the reference code and then view the result on a YUV Viewer, but that
requires quite a few steps.
To make it easier for you to easily compare video clips from MPEG and
H.264, we have performed a little sleight of hand so that you can simply
use a standard media player such as RealPlayer or Windows Media Player
to view and compare all our results. After producing the 100 kpbs H.264
stream shown in Figure 2b, we ran the result through our MPEG-2 encoder
at very high bitrate – 2400 kpbs – so as not to introduce
any further distortion. So you can easily compare the H.264 clip listed
below with any of pure MPEG2 results at various bitrates.
To get these files, go to the PixelTools' ftp
site and enter the folder: you will see 6 files that can be opened
or downloaded:
1. foreman_H264_100kbps.26L
2. foreman_H264_100kbps.mpg
3. foreman_mpeg2_100kbps.mpg
4. foreman_mpeg2_200kbps.mpg
5. foreman_mpeg2_400kbps.mpg
6. foreman_mpeg2_2400kbps.mpg
The first file is the actual H.264 encoded stream, if you have easy
access to an H.264 decoder. If you would prefer to view the wrapped
version through a media player, use the second file instead.
The remaining 4 files are MPEG2-generated cases at very high, medium
and very low bitrates for comparison. Enjoy!
![]()
Independent Evaluations of H.264 Performance
Nokia and Tampere University of Technology paper focuses on
comparisons between H.264 and H.263 for very low bit rates of interest
to wireless video conference applications. The results confirm that
most of the conglutination time is spent in the motion estimation search
involving variable block sizes. Very low bit rates are achieved by lowering
the global quantization parameter.
What H.264 products are available now or on the way?
As of February 2003, there is considerable standards
evaluation, prototyping, and development activity under way among many
digital systems vendors, many of whom are active participants in the
Joint Video Team. The number of released products and announcements
is far lower, however. The following links provide a snapshot of some
early participants.
PixelTools
Activities with H.264
PixelTools Corporation has been providing MPEG
solutions and products to customers since 1994. Anticipating the next
step in our product line, we have been engaged in following and evaluating
H.264 progress since mid-2002. Our focus is to serve the off-line content
encoding market with a very flexible, high quality software implementation
of H.264 reference software, similar to our current MPEG2 products such
as MPEG Repair, DVD Expert, and Expert HD.
As a first step, we are providing Expert H264, which provides windows
interface for running the most current reference implementation of H.264.
The interface is consistent with that of our newest high performance
encoder, Expert HD, and provides the ability to encode from a variety
of popular source formats and to monitor encoding progress through the
encoding UI.
In addition, we are providing value to the specification and reference
implementation by providing the following:
| • | Global Rate Control. We are currently prototyping a client-side mechanism for global rate control. This mechanism provides soft control of overall bit rate, without interfering with the rate-distortion optimization at the macro block level. |
| • | Performance Optimization. In Expert HD, our latest software encoder for MPEG-2, we have applied a variety of optimization techniques to retain our high flexibility and quality, while reducing execution times to reduce execution times by a factor 3 to 10. Some of these techniques are platform independent, while others take advantage of platform-specific capabilities such as Intel SSE-2. We are currently working closely with leading H.264 experts to employ fast motion estimation algorithms and performance optimization techniques to produce a fast, high flexibility and high quality software implementation of H.264 reference code behavior, with target of summer 2003. |
| • | Shrink-wrapped Package with GUI for Content Producers. We are extending our MPEG-Repair and DVD-Expert products to add H.264 encoding to the current MPEG 1 & 2 capabilities. Higher flexibility UIs will be extended to provide user control of all options supported by the Baseline, Main and Extended profiles. |
| • | Optimization for Error Robustness. Many H.264 encoding features must be applied at the video layer to optimize the tradeoffs between bitrate and error reduction. PixelTools is developing UI support and quantitative guidance for optimally employing these features under different system environments. |