Enabling Real-Time Communication on the Web Platform

Mozilla’s manifesto describes the internet as an integral part of modern life and a key component in communication. However, communication on the web has far to go before it’s as rich as face-to-face communication. Real-time video communication on the web should be easy, rich, and readily available to developers in a way that proprietary formats can’t be.

That’s why a new project is spinning up at Mozilla called WebRTC (Real-Time Communication). WebRTC will allow developers to use the web platform to include video and audio conferencing as part of their websites and applications, both mobile and on the desktop. In its first phase, WebRTC will make webcam feeds a primary object in the browser, allowing sites to create rich interactions such as video calling and conferencing. In later phases, WebRTC will allow interactions like co-browsing, in which users can share their screen with a friend.

Privacy and Security

Privacy and security are major concern in enabling open video communication on the web. A face and voice are two of the most identifiable kinds of shareable data, and keeping users in absolute control of who has access to them is vital. As the IETF states in its WebRTC draft document, the ability for users to control access to their webcam, be able to cancel communication at any time, and not be eavesdropped upon are essential.

Some of the challenges we’ll face are in giving users the most accurate information possible about the site and caller who are requesting access to their webcam. Most requests for webcam access will simply be from a trusted site itself, but a malicious site could potentially try to gain access by embedding its call request within a trusted site. In this paper, Eric Rescorla outlines how potential “ad-hoc” calling attacks could come from ads in iFrames embedded within trusted sites.  Many other potential attacks need to be dealt with.  For instance, because WebRTC would be controlled by a web server rather than conventional real-time systems, web browsers might expose JavaScript APIs which allow a server to place a call. If access to such an API were unrestricted, sites could “bug” a user’s computer and capture video camera activity (Rescorla).

Even a trusted site could be compromised, both during a call or after. And, since the sites themselves would control and display the UI of the call itself, Firefox needs to give the user both constant indication that they are in a call and the ability to disconnect at any time.

User Interface

However, guarding against threats only goes so far towards keeping users in control of their webcam communication. Clear messaging, useful tools, and sensible defaults need to be in place for video conferencing to safely take root in the browser.

The first phase of enabling WebRTC will allow the most basic use case: giving a site access to a user’s webcam and microphone. The browser already serves as a mediator for other user data, such as location and access to cookies. Firefox usually asks for permissions using a door hanger notification. Door hangers stem from the URL bar to show the site is asking for a permission, and it extends past the content area to show that Firefox is the mediator of the permission request. Using a door hanger notification for WebRTC is both consistent within Firefox and correctly conveys visually that the site has requested access, and Firefox is asking the user for that permission.

Usually, these door hangers simply ask the user for a permission, and in a click the user can give it. However, webcam access requires a secondary stage: showing a preview of the webcam feed. This approach has three benefits:

  1. It gives users the ability to make sure their webcam and microphone work correctly
  2. If users had casually or accidentally accepted the webcam permission, nothing makes people more aware of what they’re about to transmit like showing them their own grubby mug
  3. It gives users the ability to fix their hair/put on a shirt/remove incriminating items from background before beginning call

In some ways, it’s unfortunate to ask users to pass through two dialogs to give webcam feed rather than one. After all, in most cases the site itself will be providing all necessary UI, and perhaps even a video preview before a call is initiated. So, this could all be redundant in many cases.  However, we cannot predict what purpose a site may be requesting webcam feed for, nor what UI will be in place for the user on that page. Even with all our efforts against security threats, any request for webcam access must be treated as potentially malicious.

Once a user has given a site access to their webcam and is likely engaging in face-to-face communication, that interaction should be given a heightened level of priority within the browser. For a user to lose that tab or forget they are broadcasting could range from mildly embarrassing to, well, use your imagination. If a user is actively sharing their webcam feed, they should be able to jump to the tab where data’s being shared or simply cut their webcam feed from anywhere within Firefox. This will require at the very least a toolbar-level Firefox control that appears once a user’s actively sharing.

Designing and implementing a new API is always a complex process.  If you’re interested in reading more or contributing to this project, here are some resources:


Chime in Leave a Comment

  1. Barryvan says:

    Intuitively, I would assume that if I left a tab unattended for some period of time, sharing would be automatically paused. A tab could be considered unattended if the user swaps to another tab, moves to another application entirely, the screensaver is activated, or the monitor is turned off. At a basic level, I would assume that “paused” sharing would actually consist of Firefox sending an alternative feed — for example, a message saying, “This feed is temporarily unavailable”, perhaps overlaid on the last captured frame.

    Some time after the feed is paused, the webcam and microphone could be turned off, to conserve battery power.

    I’m interested in how this entire system would be implemented from a UI perspective — I agree that having two dialogs for user confirmation is undesirable. If the requirement that the user’s recorded image be visible on-screen at some minimum size is enforced, then a simple click-to-begin model (akin to most web video) would perhaps be the simplest. However, this would only work for video, not for audio and other forms of interaction.

  2. Potch says:

    Could a site that is ‘installed’ as an app have fewer hoops to jump through to access these features? The user has already declared an element of trust via the installation process.

  3. thana says:

    Just show the webcam feed directly in the doorhanger, then you can eliminate the second step. It also improves the UX for the case where the user changes his mind after seeing the webcam feed, because in this way he doesn’t even grant the permission in the first place, instead of dismissing the second dialog, finding the permissions icon, clicking it and revoking the permission afterwards.

  4. Caspy7 says:

    @thana has an interesting suggestion of incorporating the cam preview into the permission request, though something intuitive in me raises a red flag that this could be problematic. Given that it will need to interface with hardware there could be a delay. Or simply enough, given that a website can ask for permission when they want, a user may be annoyed that a site can show them what they look like at a whim or users may perceive that the site has access to the camera. (Imagine that a site displays a large message saying “I see you!” and then the user’s cam light comes on and they immediately see their video. Their response is likely to be guttural. Rather than reading the permission text, they think “I’m being spied on!” and seek to escape and close out of the site ASAP.)

    I’m guessing there will be an audio only version of this doorhanger?
    Curious, will we also incorporate controls to mute audio & mic? With HTML5 video we implement overlaying the play and volume controls so it would seem to be a common enough feature that we would include it.

    The other thought that came to mind is text chat. If you create a audio/video connection with someone via WebRTC, would the site you’re one have to implement a separate means to communicate with text or is there a way to take advantage of the current connection?
    It sounds like you guys are already thinking ahead with extendability (with the parallel browsing stuff) so with secure XMPP already in existence, perhaps you can incorporate that or at least keep it in mind.

  5. davidillsley says:

    I’m in rough agreement with @thana. ‘multiple dialogs’ brings to mind the (imo) disaster that is addon installation. A single doorhanger which has 2 stages and resizes after an initial click is the approach I’d be attempting.

  6. Caspy7 says:

    I think your suggestion of having two stages is much closer to Jennifer’s proposal (rather than @thana’s one stage approach). Sounds like you’re primarily opting for using a resizing doorhanger over a modal dialog.
    She referred to them together as dialogs – the first one clearly is not – so I don’t know that she meant to imply that the second one would be either. I don’t know, but I think she could have well said ‘prompts’.

  7. shaver says:

    Now you’ve got me thinking about the difference between *identifying* things (face and voice, for sure) and *correlatable* things. I hope you’re happy with yourself.

  8. Mook says:


    I don’t think that inactive tabs should always automatically disable sharing (though it probably is frequently useful). If I’m on a voice-only chat, for example, switching to a different tab with a muted game sounds reasonable; I can use the game to fill up unused attention and stop playing for a bit if something more engaging came up.

    This does mean having an indicator that persists across tabs (and possibly windows) might be useful, though…

  9. Caspy7 says:

    @Mook makes a good point about automatically disabling an active connection. One consider the same circumstance for a casual video chat as well. Less likely, but still feasible.

    However dropping the drawn FPS significantly for inactive tabs would make good sense (assuming we’re not already doing this on all HTML5 video).
    (Also, if we’re not doing this on HTML5 video, shouldn’t we?)

  10. Real time communication on the web has far to go before it’s as rich as face-to-face communication. Real-time video communication on the web should be easy, rich, and readily available to developers in a way that proprietary formats can’t be.

  11. Jomwall offers you the services like JomWALL Component, JomWALL Gallery Component , Article plugin comment box, facebook login, + 9 Modules and various other. This is a great way of communication.

  12. Leif Focke says:

    Audio conferencing is good but video conferencing is still the best. ;

    <a href="Check out all of the most interesting post on our very own web site

Comments are now closed for this article.