Videos for the Web with HTML 5 - an Introduction

HTML5: the HTML5 Video Tag, Browser Support, Converting Videos into an HTML5 Supported Format.

HTML5 versus Flash

Until lately Flash Video had been the predominating format for videos on the web. However with the emergence of HTML5 a new open standard is going to take its place. It has begun as an initiative of the browser producers Mozilla (Firefox) and Opera. On the 4th of June 2004 the Mozilla Foundation, Opera and Apple Safari have founded the WHATWG (Web Hypertext Applications Technology Working Group) which had from now on been responsible for elaborating the new HTML5 standard. Little time later Google has joint the efforts by contributing the new patent-free video codec VP8. That way it has become obvious that also Google Chrome and the new Internet Explorer 10 do support HTML5. HTML5 video will be the format of the future. Not only Youtube has switched to supporting HTML5 video. Mobile Devices like smartphones do natively support HTML5 so that HTML5 video is the only format supported by these devices.

The HTML5 technology coming from the open source sector can offer various advantages:

The fact that Adobe Flash is still widespread can be found in the inaction of many web site operators that have already invested into Adobe Flash and are reluctant to make the changeover. Some of them still state the usage of the Internet Explorer 8 as reason not to venture the change. However the Internet Explorer is not far more difficult to upgrade than applying regular Adobe Flash Player upgrades. For this reason I will supply two workarounds for the elder IE8 to make them work with two technologies HTML5 and Adobe Flash or Java respectively.

In deed the substitution of Adobe Flash is not only a matter of videos but of a whole range of technologies:

In the meanwhile with WebRTC everything you need for real time communication has become availabel: MediaStream (access to webcam and microphone), RTCPeerConnection (audio and video data streaming/communication with facilities for encryption and bandwidth management), RTCDataChannel (peer-to-peer communication of generic data). By WebRTC you may implement things like VoIP telephoning or video conferences merely via Javascript and your browser.

video formats supported by HTML5

Before we can go into detail we will need to have a short survey about encodings and container format standards used with HTML5. Having multiple audio tracks and eventually subtitles in different languages for one video track requires a container format to store all these different data types together. The MPEG4 container format f.i. can serve together with pictures for whole application/mp4-s as a container. The encoding of a respective audio or video track on the other hand will determine the efficiency of data storage and thus the quality of the video.

The video codec H.264 used with MPEG-4 is protected by patent law up to the year 2028 hanging like the sword of Damocles over open source developers. They can be asked to pay patent filing fees at any point in time which could only be managed by banning H.264 entirely from a product that is offered free of cost; at least as long as the duty to pay patent filing fees would also extend to free software and could not be managed to be payed as a certain percentage of the product price.

That way we would need to prefer the VP8 video codec which is used together with the Vorbis audio codec in the Matroska container format as video/webm. By the take over of On2 Technologies which had been developing the ogg container format together with the Xiph.Org foundation Google has made these codecs publicly and available free of cost. VP8 would have surely emerged the standard for HTML5 if there would not be Apples Safari browser and the many mobile devices from Apple like the iPod, the iPhone and the iPad which dispose over hardware accelaration for H.264. I will show you how to supply videos in both formats making the browser select the right one.

Now let us sum up: Even though certain container formats could almost be arbitrarily combined with differently encoded tracks the following three combinations are used with HTML5:

We do have the following file endings: .webm (video/webm), .mp4, .m4a, .m4r, .m4b (video/mp4, audio/mp4, application/mp4; MPEG-4: used by Apple also for audio, ring tones and audio books), .ogg, .oga und .ogv for audio or video respectively (video/ogg, audio/ogg). While the video/webm-Format can come up to the quality and data-storage-efficiency of video/mp4 the VP3.2 codec is obsolete and can not serve with the same quality at the same bit rate.

Browser Support: Safari: MPEG-4 only, Firefox: MPEG-4 not before version 20, WebM, IE9: VP8 can be installed as plugin, H2.64: o.k., Opera: WebM, Konqueror: MPEG-4.

Data Rates for Videos on the Web

After having clarified the question about the data format the question about valid bit rates to be offered remains. Let us have a look at the speed of different internet connections:

While you can only transport a mono audio stream at 32 kbps or 64 kbps, 150 kbps up to 600 kbps make video transmissions possible. Common DVD-quality can be gained with 700 kbit/s (2 hours of video on 4.5GB). HDTV needs approx. 7 Mbit.

Saw tooth pattern as it emerges by the congestion control of TCP.

It needs to be taken into account that at most half of the available bandwidth can be exploitet by TCP because of its congestion control. The additive increase - multiplicative decrease technique creates a so called saw tooth pattern as can be seen in the picture at the right. Every time a packet gets lost the load on the network needs to be decreased in order to avoid a congestion. If you have a look at the graphic then you will se that just approximately the half of the area is placed below the saw tooth.

HTML5 video uses HTTP an therefore TCP (HTTP is a TCP-based protocol). It would basically be preferable to use UDP for multimedia data transferral. With UDP lost packets that would arrive after the playback time do not need to be retransmitted and thus do not halt the traffic that comes after them. That is why it is possible that playback gets interrupted with TCP. It would be good if the receiver buffer was augmented on playback interruption so that playback can restart right where it has stopped. Unfortunately most browsers do not implement it like this so that the interruption time will be missing in the playback. TCP typically waits several round trip times until it retransmits a packet. As an RTT may last as long as a second in GPRS networks several seconds of preloading are required for HTML5 videos.

However the usage of HTTP/TCP does also have some advantages: HTTP traffic on TCP port 80 goes through every firewall. Existing web proxies can cache the video. You do not need a dedicated media server as the default web server is responsible for delivering the video.

The bitrate at a given resolution and frame rate does in practice also depend on how good the video can be compressed. Videos can f.i. be compressed better at slow movements because (at least with P-frames) only the differences to the previous picture needs to be encoded. If we require a fixed data rate at video conversion in order to stream our video later on via the internet fast movements will be compensated by a higher quantization, i.e. by a lower quality.

216x234 145 kbit/s
640x360 365 kbit/s
768x432 730 kbit/s - 1100 kbit/s
960x540 2000 kbit/s
1280x720 3000 kbit/s - 4500 kbit/s
1920x1080 6000 kbit/s - 7800 kbit/s

It is good to have a table like the one above which features H.264 for a 16:9 aspect ratio when converting videos with ffmpeg or one of its frontends (VLC,Firefogg) as ffmpeg will not be able to hold the required bitrate at a too high resolution because quantization can only compensate things up to a certain extent.

Take at best a video resolution which is divisable by 2, 4, 8, 16 or even 32 because pictures can be scaled most free of loss at these resolutions. The maximum frame rate perceivable by the human eye amounts to 25 fps. A higher frame rate will simply be a waste of bandwith. Even here a frame rate which is divisable by 5 may yield better results when converting to 25 fps.

Encoding Videos with ffmpeg for HTML5

ffmpeg supports almost all current video and audio formats and its library libavcodec is used by many frontends and players like VLC, MPlayer, Xine and Firefogg. Even the VP8 codec was implemented new from scratch by ffmpeg to substitute Googles apparently slower libvpx.

Let us ascertain that ffmpeg supports the video and audio codecs desired by us (grep is just a Unix command; the ffmpeg command line will however work under any OS like Windows): > ffmpeg -codecs | egrep "flac|vorbis|aac|h264|vp8|vp3" DEA D aac Advanced Audio Coding D A D aac_latm AAC LATM (Advanced Audio Codec LATM syntax) DEA D flac FLAC (Free Lossless Audio Codec) D V D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 EA libfaac libfaac AAC (Advanced Audio Codec) DEA D vorbis Vorbis D VSD vp3 On2 VP3 D V D vp8 On2 VP8 > ffmpeg -formats | egrep "webm|mp4|ogg" E mp4 MP4 format DE ogg Ogg E webm WebM file format

Seems to fit.

Let us now fetch a source video from our camera and inspect its coding ('Reiher' is the German word for heron):

> ffmpeg -i Reiher.MOV ... Guessed Channel Layout for Input Stream #0.1 : mono Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Reiher.MOV': Metadata: creation_time : 2008-07-13 16:50:08 Duration: 00:00:16.50, start: 0.000000, bitrate: 12786 kb/s Stream #0:0(eng): Video: mjpeg (jpeg / 0x6765706A), yuvj420p, 848x480, 12720 kb/s, 30 fps, 30 tbr, 30 tbn, 30 tbc Metadata: creation_time : 2008-07-13 16:50:08 Stream #0:1(eng): Audio: pcm_u8 (raw / 0x20776172), 8000 Hz, mono, u8, 64 kb/s Metadata: creation_time : 2008-07-13 16:50:08

If we wanna encode with 600 kbit/s then it will work as simple as this:

ffmpeg -i Reiher.MOV -vb 536k -ab 64 -r 25 Reiher-848x480-600kb.mp4 ffmpeg -i Reiher.MOV -vb 472k -ab 128k -r 25 -strict -2 -ac 2 Reiher-848x480-600kb.webm ffmpeg -i Reiher.MOV -vb 600k -an -r 25 Reiher-848x480-600kb-nosound.webm ffmpeg -i Reiher.MOV -vb 536k -ab 64 -r 25 Reiher-848x480-600kb.ogg -

ffmpeg detects the input format and will choose the desired codecs for the output format depending on the file name suffix. That works with ffmpeg Version 0.11.1 (ffmpeg -version: you can compile a newer version from git) for mp4 and ogg well; for webm on the other it seems that we have to leave out the audio track (-an: audio no). By the usage of the -strict -2 parameter and -ac 2 for two audio channels we can succeed in adding sound for webm as well. If you hear the waves while the heron is flying the whole scene gains a lot. It is not hard to realize from the example above that -ab gives the audio bitrate while -vb states the video bitrate for our output video as well as the frame rate being given with -r 25 frames per second. You could also combine all three calls into one by simply leaving ffmpeg -i Reiher.MOV out at the beginning for the succeeding two calls.

If we wanna come down to 200 kbit/s we need to decrease the resolution by -s 424x240:

> ffmpeg -i Reiher.MOV -vb 136k -ab 64k -strict -2 -ac 2 -r 25 -s 424x240 Reiher-424x240-200kb.webm Guessed Channel Layout for Input Stream #0.1 : mono Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Reiher.MOV': Metadata: creation_time : 2008-07-13 16:50:08 Duration: 00:00:16.50, start: 0.000000, bitrate: 12786 kb/s Stream #0:0(eng): Video: mjpeg (jpeg / 0x6765706A), yuvj420p, 848x480, 12720 kb/s, 30 fps, 30 tbr, 30 tbn, 30 tbc Metadata: creation_time : 2008-07-13 16:50:08 Stream #0:1(eng): Audio: pcm_u8 (raw / 0x20776172), 8000 Hz, mono, u8, 64 kb/s Metadata: creation_time : 2008-07-13 16:50:08 File 'Reiher-424x240-200kb.webm' already exists. Overwrite ? [y/N] y w:848 h:480 pixfmt:yuvj420p tb:1/30 sar:0/1 sws_param:flags=2 [buffersink @ 0xfc2a60] No opaque field provided [scale @ 0xfc2e80] w:848 h:480 fmt:yuvj420p sar:0/1 -> w:424 h:240 fmt:yuv420p sar:0/1 flags:0x4 [aformat @ 0x101cb40] auto-inserting filter 'auto-inserted resampler 0' between the filter 'src' and the filter 'aformat' [aresample @ 0x102e2c0] chl:mono fmt:u8 r:8000Hz -> chl:stereo fmt:s16 r:8000Hz [libvpx @ 0xfe0060] v0.9.7-p1 Output #0, webm, to 'Reiher-424x240-200kb.webm': Metadata: creation_time : 2008-07-13 16:50:08 encoder : Lavf54.6.100 Stream #0:0(eng): Video: vp8, yuv420p, 424x240, q=-1--1, 136 kb/s, 1k tbn, 25 tbc Metadata: creation_time : 2008-07-13 16:50:08 Stream #0:1(eng): Audio: vorbis, 8000 Hz, stereo, s16, 64 kb/s Metadata: creation_time : 2008-07-13 16:50:08 Stream mapping: Stream #0:0 -> #0:0 (mjpeg -> libvpx) Stream #0:1 -> #0:1 (pcm_u8 -> vorbis) Press [q] to stop, [?] for help Encoder did not produce proper pts, making some up.0:15.44 bitrate= 180.6kbits/s dup=0 drop=80 frame= 414 fps= 38 q=0.0 Lsize= 374kB time=00:00:16.56 bitrate= 184.8kbits/s dup=0 drop=81 video:282kB audio:84kB global headers:3kB muxing overhead 1.165478%

If we finally convert the video without any loss of quality at a bitrate of 12720 kbit/s we will recognize that the quality can still be a lot improved at the same resolution by diminishing the quantization losses.

What else can we do with ffmpeg? We could f.i. extract the audio track with the -nv option although videos on the web like those of Youtube do usually come at a low audio quality.

Much more important for us will be the possibility to extract singleton frames as images. We need an image to present while the video is being loaded or if the browser does not support html5 videos at all. Note that also singleton images chosen well can give an exciting impression (as far as they are of a good enough quality).

> mkdir images; cd images; > ffmpeg -i ../Reiher.MOV -ss 0 -t 17s -r 15 frame%3d.jpg

-ss states the starting point -t 17s the duration of the videoclip out of which we will extract images (By the way the same paramaters can be used to clip videos). With -r 15 we will get every second frame out of a 30 fps video; %3d means to always use three digits for each number (the images need to be numbered with leading zeroes in order to make the browser show them in the correct order; otherwise simply use %d). Now you can select singleton images by your free choice or simply 'play' them frame by frame with the browser of your choice like there is f.i. dolphin for KDE.

Last but not least an important hint for everyone who wants to use ffmpeg in a batch file or shell script: ffmpeg can read from stdin and you can either disallow it doing so by redirecting from <NUL or </dev/null or let it have its stdin by intermediately redirecting stdin or #0 to another file descriptor number like #9 whenever stdin is used for some other purpose; see the following code sample:

{ cat movies | while read num movie; do … { echo;echo "*************** movie: $bild -> #$no .mp4 ***************" >&2; ffmpeg -i "$movie" -vb 836k -ab 64k -r 25 -s 960x540 "$filename$no$vidspec.mp4" } 0<&9 done } 9<&0;

Failing to redirect would ffmpeg let miss to convert some videos because it always trys to read from stdin.

Comfortable Video Conversion with GUI Programs

For all of you who do not want to deal with the command line there are several GUI-frontends for ffmpeg even with exciting additional functionalities.

Everyone who does not want to worry about installing and using complicated software will be excited about the html5backgroundvideos free online converter. It has never been easier to convert short videoclips into the .ogg, .webm and .mp4 format. Converting videos up to two minutes is free; there is a choice to remove the audio track.

The classical video player VLC having been converted to many platforms uses the libavcodec of the ffmpeg project and offers predefined profiles for video conversion and even the possibility to make a screencast where everything you do on the screen will be recorded into a video.

The Firefox extension Firefogg leverages not only local video conversions but moreover allows to implement comfortable video uploads for the user with Javascript; a concept used by Wikipedia. Title, author and recording date may be adjusted.

At last we have the minimalistic Miro Video Converter Frontend for ffmpeg which will by the way display the ffmpeg command lines used for conversion.

Background: Video Compression, Pre-Editing in order to Clip Videos

If you wanna clip a video with Kino or any other oss or non-oss software it is recommended to only create I-frames at video conversion because they are the only frames at which clipping points can be set.

Now what are I-,P- and B-frames? I-frames encode a whole image, P-frames just the difference towards a previous image and B-frames do even interpolate between a previous and a succeeding image. They are not transmitted in the sequence in which they are presented (IbbbPbbbPbbbIbbbPbbb…) but in the series in which they can be decoded (IPPbbbbbbbbbIPPbbb…). You can only clip at I-frames as stated before.

MPEG videos are stored in JPEG compression which separates the picture into little squares getting encoded via coarse and narrow cosinus oscillations. The narrow or high frequency oscillations are not as important for the picture content as they do only encode details and can thus be filtered out by a coarser, i.e. higher quantization.

Let us conclude with two new command lines for ffmpeg: -g 18 would create a group of picture length of 18 (an I-frame + adjacent P&B-frames) which may compress better than the standard GOP length of 12. For the special case that you only wanna have I-frames in the output (video clipping) the special parameter -intra has been devised.

the HTML5 Video Tag: making it work

Once you have succeeded to convert your videos embedding them into HTML5 is as simple as this:

<video controls preload=metadata width=848 height=480 poster='data/frame049.jpg'> <source src='data/Reiher-424x240-200kb.mp4' type='video/mp4'> <source src='data/Reiher-424x240-200kb.webm' type='video/webm'> <source src='data/Reiher-424x240-200kb.ogg' type='video/ogg'> <img src='...'> - Bitte Updaten sie ihren Browser; dieser ist zu alt um HTML5-Videos abspielen zu können. </video>

That is how you can see the video tag which has been embedded in the same form under Enjoye the Results. With, height and the poster parameter can also be left out by your free choice. We have used width and height to scale our video automatically to a bigger format. Poster is used to display an arbitrary image while the video is being loaded and until the user hits the play video button. Otherwise the first frame will be shown until the video begins to play. Controls is necessary to show a play/pause button, volume handles, a full screen button and something to rewind and fast forward.

The following table sums up all the other attribute values for the video tag:

preload=none do not preload
preload=metadata preload metadata for the video like duration, author and copyright; saves bandwidth but loads everything that belongs to a video impression.
preload=auto start transmitting the whole video as soon as the page is loaded; very useful if you just have one video on your site.
autoplay start to play as soon as sufficint data has been loaded. use with caution; this can annoy the user if he opens multiple or even one tab at a time because the video will start to play and sound while still only the old tab is being viewed (and not the video).
loop repeat the video in a loop; at best to use in conjunction with autoplay

Media Queries with CSS3

With HTML5 it is possible to offer the same video in multiple resolutions and let the browser pick the right one for you. Unfortunately that does not work with data or bitrates so that you will have to query for them implicitly by the resolution. This is especially interesting for mobile devices which have screen resolutions as exotic as 800x480 (Galaxy Spica II), 640x320, 640x200, 416x352, 208x176.

<video controls> <source src='data/Reiher-848x480-600kb.webm' type='video/webm' media='screen and (min-width: 848px)'> <source src='data/Reiher-424x240-200kb.webm' type='video/webm' media='screen'> kein HTML5 support. </video>

The example shows how an implicit bitrate selection may work: we do not offer a video at the 800x480 resolution for 600 kbps as this is already too much for a mobile device. Instead we offer the 424x240 video for mobile devices. Nonetheless with a video as short as our flying heron we could safely offer an 800x480 video at 600 kbps.

Further CSS3-Media selectors are: min/max-width, min/max-height, orientation:portrait/landscape, min/max-aspect-ratio: 4/3 (read all of them at

Enjoye the Results

Compare yourself which conversion at what bitrate will yield the best results with your internet connection. We have renounced from using CSS3 media queries and support workarounds for elder browser to make you get the direct HTML4 video experience.

Heron 424x240, 200kbit/s

Heron 424x240, 200kbit/s, skaliert auf 848x480

Heron 848x480, 600kbit/s

Heron 848x480, verlustfrei konvertiert

Image Gallery

heron flies over lake Ossiach

heron approaching the shore

heron is approaching a trunk at the shore

heron is landing on a trunk at the shore

heron retracting his wings sitting on a trunk at the shore

heron sitting on a trunk at the shore