Captioning YouTube Video and Giving Accessible Controls A rather pain-free method to have accessible, captioned video in the web pages. — brought to you by the friendly people at The Ohio State University Web Accessibility Center With a little bit of work, some free of charge online tools, and code and utilities available from this web page, you can provide your students, staff, and other users inside and outside the university access to website video clip that is usable by everyone, including people with disabilities. On this page, we cover some effective methods for captioning and embedding YouTube video inside your web site pages. We also describe and link to a tool for converting YouTube captions into formats suitable for use in other video players. Introduction: YouTube, Captioning, Access YouTube may not be the appropriate host for some materials over which the author wants or needs to maintain strict control due to intellectual property or privacy concerns. Also, YouTube videos are limited to 10 minutes. So, if you have longer material, you will need to segment the longer video clip into parts of 10 minutes or less. But if you want your video clip shared freely with the entire world, it is hard to conceive of a better venue than YouTube. If you decide to use YouTube for your video clip, there are a couple of things you can do to ensure that the video is accessible to people with disabilities, who may have a difficult time operating the YouTube video player controls or who are deaf or hard-of-hearing. For people with motor disabilities or who rely on screen readers to access net content, you can embed the video clip inside your world wide web pages in a manner that guarantees the playback controls are usable. For the deaf and hard-of-hearing, you can provide captions. Of course, captioning positively affects people beyond your deaf and hard-of-hearing users. YouTube supports subtitles—synchronized transcripts in languages different than the original audio—using the same mechanism as captions. Subtitles can facilitate communication with people across the globe, as well as help students trying to learn a foreign language. Captions can also help users in noisy venues (the gym?, a café?) or places where there is enforced quiet (libraries and computer labs, for instance). And they can help both users with cognitive disabilities and non-native speakers by presenting content in multiple “modes,” both aural and textual. Finally, it is worth pointing out that The Ohio State University has a Web Accessibility Policy that requires all video clip that can be accessed by the general public to get synchronized captions. (Regarding non-public media, the policy states that video with a known, limited, and secured viewership, such as video for courses requiring registration, internal OSU video demonstrations or staff-educational materials, etc., must be captioned in a timely manner, on request. Of course, we would prefer all video to be captioned.) So, not only are you doing a service to your viewers by captioning and providing available controls, you are conforming to university policy. How to Caption YouTube Videos Probably the most time-consuming element of captioning is getting an accurate transcript of the audio. The other parts of the process—editing the transcript and synchronizing it with the video—can also be difficult and labor intensive, though, as we will see, synchronizing the caption has gotten significantly easier since YouTube introduced automatic caption timing during the Fall of 2009. Though it can be time-consuming, after you have captioned one or two videos, a natural work flow and pacing develops, and you will gain proficiency and figure out which techniques work best for your particular situation. You may also decide that parts of the process should be farmed out to staff, student employees, or pay-for services. In addition to covering the basics of the process, below we also try to outline what we think are good practices to follow and provide suggestions on software and mention a few helpful services. In outline,
Microsoft Office Professional Plus 2010, the steps in captioning YouTube video are: Get a transcript “Chunk” the trancript Synchronize the chunked transcript with the video Getting the Transcript To produce good captions for your video, you will need an accurate transcription of the spoken audio of the video. There are many ways to get a good transcript. One way is to use a transcription service. One transcription service we have heard good things about is Casting Words. A high-quality transcription with a six-day turn-around will cost $1.50 per minute—$90 an hour. Another way is to do the transcription yourself. This laborious process can be helped along with a couple of software programs. One program that may help is Express Scribe. Express Scribe is free of cost and works in Windows and Mac. Express Scribe offers you a lot of control over audio playback. With it, you can use simple keystrokes to pause, play, and rewind in short increments. The Express Scribe player can be minimized and “pinned” so that it is always visible and its keyboard shortcuts always to choose from. Express Scribe requires you have audio in either WAV or MP3 format. So if you want to use it, you will need to extract the audio from your video clip. You can use the totally free, on-line service Media Convert to do that, or various video clip and audio programs, such as QuickTime Pro ($30), can extract audio. (If your video clip camera produces MOV files as output, QuickTime Pro is certainly worth having.) VLC Media Player is a free of cost program for Mac, Windows, and Linux that can playback and convert to and from a broad variety of formats. It includes the ability to export WAV and other formats. A speech recognition program may help with transcription, as well. Speech recognition programs turn speech into text. If you are using a recent version of Windows (Windows Vista and later), you have for sale a very good speech recognition program, called Windows Speech Recognition. Earlier versions of Windows have speech recognition programs, but they are a bit clunky and not very accurate. The excellent speech recogntion in
Windows 7 may help significantly with transcription. It is equivalent in quality to commercial speech recognition programs, such as Dragon Naturally Speaking (Windows) and MacSpeech Dictate (Mac). All speech recognition programs, including Windows Speech Recognition, need to be trained to recognize your voice. Training typically takes only a half-hour or so. Once you have trained the speech recognition, transcription is a process of listening and using a headset mic to echo back the spoken audio. In our work transcribing using speech recognition, we have found that producing an accurate transcript typically takes between three and four times as long as the original audio. So, a half hour of video clip will take you roughly an hour and a half to accurately transcribe, once you get proficient using speech recognition with Express Scribe. This may seem like a long time. But try typing out a transcript manually,
Microsoft Office 2007 Enterprise, and you will see 3:1 is not that bad. If you have lots of specialized vocabulary or names, especially non-Western names, as part of your audio, speech recognition will go more slowly, and you will need to train the software to accurately transcribe the unusual words. Whatever tools you use to get your transcription, take your time and produce a highly accurate transcription. Accurate transcription and good synchronization are the cornerstones of quality captioning. “Chunking” the Transcript Chunking a transcript involves breaking it into lengths appropriate to be displayed in one pop-on caption. A chunk can be one or two lines of transcript. More than that and you start to get problems with how the captions display and readability is negatively affected. You will want to keep the lines less than 42 characters long for purposes of readability. Also, there are conventions that are used to identify speakers and “sound effects,” such as background sounds or to indicate how a thing is being spoken. For example, a sound effect might be something like “leaves rustle outside,” “music plays,” or, when indicating a speaker, “Bob [shouting].” Another common cue is to surround music with music notes. The Media Access Group at WGBH has a Captioning FAQ that provides some conventions for how to caption, though it is geared more for closed captioning for video and TV. The Described and Captioned Media Program (DCMP) has excellent materials on caption style, chunking, line division, and other conventions in their Captioning Key pages. In our example below, we follow YouTube's recommendations for preparing a transcript file, which appear to blend a number of conventions. The example below has some examples of good and bad chunking and shows how you can introduce a speaker and insert a “sound effect.” Note that line breaks should occur at logical places, so that each line is as semantically complete as possible to make for easy reading. Also try to get chunks to “feel” complete—for example, in the table below, we decide to break the chunks from the song so that they match the singer's phrasing. Example of Good and Poor Transcript Chunking Good Poor >> VERBAL KINT: The greatest trick the devil ever pulled was convincing the entire world he didn't exist. >> VERBAL KINT: The greatest trick the devil ever pulled was convincing the world he didn't exist. Take your stinking paws off me, you damn dirty ape. [grunting sounds] Take your stinking paws off me, you damn dirty ape. [grunting sounds] >> RICK ASTLEY [crooning]: ♪ Never gonna give you up, Never gonna let you down... ♪ ♪ Never gonna run around and desert you. ♪ >> RICK ASTLEY [crooning]: ♪ Never gonna give you up, ♪ ♪ Never gonna let you down, Never gonna run around and desert you. ♪ A Note on Audio Description DCMP also has very good information on audio description (AD), what they call video clip description, in their Description Key pages. An audio description is an audio-only track that runs synchronously with the main video audio and describes visual content, so that people who cannot see the video have the necessary context for understanding what is going on. In addition to the DCMP materials on audio description,
Windows 7 Pro, Joe Clark has developed a set of standard techniques in audio description. Some things that might be described to enhance and clarify comprehension are: Opening titles, on-screen only text, and (to a reasonable limit) closing credits Mise-en-scène, scene changes, and costume or character appearance Actions and gestures Observable emotional states In general, try to speak descriptions when there is a pause while in the primary audio track, but speak over the primary track when required to add to the understanding of the video clip. The narrator's voice should be able to be easily distinguished from the primary audio. If you need audio description for your videos, you will need to record the audio as a separate track and merge it together with your video in your video editing software. This is possible even with absolutely free and low-budget software, such as Windows Live Movie Maker (Windows only), Apple iMovie (Mac only), and Apple QuickTime Pro (Windows and Mac). Two Ways to Synchronize YouTube Captions From the sections below we discuss two ways to synchronize your chunked transcript along with your YouTube video to create a timed caption track. One method is to let YouTube's Automatic Timing facility attempt to automatically perform the synchronization. This may not always work. The audio inside your video may be low quality or, for whatever reason, YouTube simply may not be able to produce adequately synchronized timings. Therefore we give another solution, using an online service from Accessify called Easy YouTube Caption Creator. Synchronizing the Automatic Way: Using YouTube's Automatic Timing In Fall 2009, Google began incorporating into YouTube the Automatic Speech Recognition engine that helps power the transcription service in Google Voice. The first phase of this introduced Automatic Timing, which provides the ability to automatically synchronize your transcript along with your YouTube video. In first quarter 2010, YouTube rolled out Automatic Transcription, making it possible to generate a transcript of the video clip, and thereby automating the entire captioning process. In our experience, the Automatic Transcription facility is not capable in most circumstances of producing an accurate transcript. In cases where the voices on your audio are well recorded and the speakers speak very clearly, YouTube will likely produce a transcript that can be manually corrected, and you may save yourself some effort compared to producing a transcript from scratch. For the majority of cases, however, the machine-generated transcription will not be very good, and you are better off using only YouTube's Automatic Timing facility. Here are the steps to use YouTube's Automatic Timing to synchronize your chunked transcript: Sign in to YouTube and select “My Videos” under your account name. Locate the video you want to caption, and click its Captions button. Click the “Add New Captions or Transcript Button.” A new page loads. Click the “Browse” button. Locate your chunked transcript, and click OK. Under “Type”, select the “Transcript File” radio button and click “Upload File.” Once the Caption Track has uploaded and finished processing (synchronizing), make sure the checkbox next to it is selected and save your chages. That is all there is to it. Here are some examples that demonstrate usages of both Automatic Timing and Automatic Transcription. The titles inside video playlist below describe how the captions were made: You will notice that Automatic Timing very accurately synchronizes Paul Schindler's voice with the transcript. In our experiment, with mediocre audio and multiple speakers, alignment is mostly on target, except for a few instances in which Emily's voice is not matched properly. The Automatic Transcription of Paul Schindler is pretty good. Though not perfect,
Microsoft Office Professional Plus 2010, it produces a result that might be edited and corrected. This is in stark contrast to our mediocre audio example, which produces an unusable caption track. Synchronizing the Manual Way: Using Accessify's Easy YouTube Caption Creator If you cannot use YouTube's Automatic Timing to synchronize your chunked transcript, you will need to do it manually. The manual synchronization process involves playing back the video clip and marking the times at which each transcript chunk occurs within just the video clip. There are a number of tools that can help with this. NCAM's MAGpie, MovCaptioner, YouTubeCC, and CaptionTube are worth looking into. MAGpie is a tried and true, stand-alone Java application that you install on your Windows machine (the program does not currently support the Mac). MovCaptioner ($25) is made for Mac only and is one of the best products for that platform, outputting caption files in many formats, including SubRip. Both MAGpie and MovCaptioner allow you to either input caption lines as they play or import a chunked transcript. MovCaptioner has the advantage of playing back the video clip in short snippets (one to 11 seconds) to facilitate transcribing. YouTubeCC and CaptionTube are online applications. Both use a model for timing in which you type each caption chunk in individually, similar to modes on the market in both MovCaptioner and MAGpie. We find this solution cumbersome, but it may work well for you. The service we recommend is Accessify's Easy YouTube Caption Creator. Like MAGpie and MovCaptioner, it allows you to import the transcript, already chunked and fully prepared. You playback the video inside the net application and set timings using a keystroke—simple and straight-forward. Here are the steps: Put while in the URL of your YouTube video. Paste in your chunked transcript. Start playback of your respective video clip. Click the “a” key when you hear the text with the caption chunk being spoken. When the video clip has run to the end, copy the timed-text output into a file in your computer. Note that Easy YouTube Caption Creator thinks of a chunk as a single line of the transcript. So, if you have multiple-line chunks, join them into a single line before pasting them in to Caption Creator. Clicking the “a” key sets a time for each caption chunk. When you have worked your way through the entire video clip, you can copy the timed-text output the Caption Creator has made for you. Paste it into a text file in your computer and name the file with the .sbv extension: my_caption_file.sbv, for example. Finally, you must upload your timed caption file to YouTube. YouTube makes it simple to upload your caption file and associate it along with your video. Sign in to YouTube and go to “My Videos.” Locate the video clip you want to add captions to and click its “Edit” button. Click the “Captions and Subtitles” link. On the “Captions and Subtitles” page, click the “Browse...” button, locate the caption file on your computer, and click the “Open” button. Back on the YouTube page,
Office Professional 2007, select a “Track Language,” English, for example, and click the “Upload” button. YouTube associates your caption file with the video clip. When you reload your video clip in YouTube, you will see that it has captions. You can upload more than one caption track. You can use this feature to upload tracks in another language—in which case you have created a “subtitle.” If you upload more than one track, note that the end user will need to be able to access the Flash controls in order to change the captionsubtitle track. Thus, in terms of accessibility, it may make sense to get more than one copy of the video clip on YouTube and associate just a single subtitle with each instance. Convert YouTube .sbv to SubRip, W3C Timed Text, and QT Text Having captions in YouTube is wonderful. But it would be even better if we could use the timed captions in YouTube for other services or for hosting our individual video clip. The problem, of course, is that not all video clip players or services accept Subviewer formatted captions. We have written a YouTube caption converter that will convert YouTube Subviewer format to SubRip, W3C Timed Text Markup Language (DFXP), and QT Text. W3C Timed Text is used in a number of players including Adobe's video clip component for Flash and the popular JW Player. And QT Text is the format used to caption QuickTime MOV files. Now you can download your timed captions from YouTube, convert them, and re-purpose them for use elsewhere. Available Controls for the YouTube Embedded Video clip Player The YouTube video clip player is used everywhere. YouTube is a boon for discovering and broadcasting video on the world wide web and is widely used in education. The player itself is implemented in Flash, which allows for high quality video and sound and attractive interface controls. As discussed above, the Flash-based YouTube player allows for captioning and subtitling of video, which is a great benefit for many reasons. One problem with Flash, however, is that in many browsers it is not available to the keyboard alone. Another problem is that screen reader programs for the visually impaired cannot always accurately discern the function of controls implemented in Flash, and some screen readers cannot access Flash controls at all. For example, all browsers running in MS Windows except World-wide-web Explorer cannot get focus to a Flash movie using the keyboard alone—a user must hover over a movie with the mouse and click. Tabbing into the movie is not possible. Once in the Flash movie controls for the YouTube video, all browsers other than IE are “trapped”, tabbing through the player controls perpetually, unable to access any other parts of the net page. The problem is similarly difficult for screen reader users. A portion of experienced users of screen readers will know that they must turn off their screen reader's regular page browsing mode and go into a “pass-through” mode to be able to read the buttons with the YouTube player, but even then the only usable buttons from the Flash-based YouTube player are Play and Mute. And if you happen to be using VoiceOver, the screen reader in Mac OS X, Flash controls are inaccessible. Thus, for keyboard- or screen reader-reliant users YouTube can present a difficult situation. (For more information on this problem, see our write-up on Flash accessibility in JW Player Controls.) Suffice it to say that HTML-based controls are preferable for accessibility. All browsers can access HTML-based controls and there is no need for mode-switching. This is where our Available Controls for the YouTube Embedded Video clip Player can come in handy: The Controls provide HTML links to start, pause, stop, jump backward and forward, adjust and mute the volume, and loop a video. The Controls provide HTML headings that help a screen reader user determine where the controls are, what the name of the currently playing movie is, and how much time has elapsed in playback. The Controls set up the video so that if there are captions they display by default, and, if you have high-quality video clip, that will also display by default. Finally, the Controls facilitate adding a play list and displaying it in an accessible manner. Get the Code for the Accessible Controls You can download ytp.zip, which contains the JavaScript code that generates the player controls, a sample page that demonstrates usage, and a sample stylesheet that you can modify for use in your very own pages.. Because of the nature of Flash embedding, to see the demo in action you will need to view the sample page in a net server. Configuring the Available Controls Embedding a YouTube video clip in a web site page is a simple matter of copying the “embed” code from YouTube. However, what you get on your web site page has problems in terms of accessibility, as outlined above. Also, the embed code, itself is fairly ugly and hard to edit, if you want to change any of the parameters. And if you want to encapsulate the embed so that it includes controls in HTML to help with accessibility or add a play list, you will be dropping in even more hard to maintain code. By contrast, using the Available Controls is simple and maintainable. Including accessible YouTube in your net pages requires adding a couple of lines inside head and inserting one or more div elements with a class of “ytplayerbox” with the body your website page. JavaScript takes care of rendering out the player and controls to suit your needs. Code Example: Including the JavaScript and Style Sheet The following code goes while in the head of your respective document. <!-- Stylesheet and JavaScripts for Accessible Controls for the YouTube Embedded Video clip Player --> <link rel="stylesheet" type="textcss" href="cssytp.css" > <script type="textjavascript" src=" <script type="textjavascript" src="jsytp.js"><script> Above we reference the style sheet and the JavaScript for the controls. We also need to pull in SWFObject, which is used for dynamic, browser-independent Flash movie embedding. The Available Controls for the YouTube Embedded Video clip Player makes the process of embedding controls and a play list very simple and easy to maintain. The code below demonstrates how you would add a video with accessible controls and a play list totalling five videos. Code Example: Player Configuration <!-- This is where the player, buttons, and (optional) play list get rendered --> <div class="ytplayerbox"> <!-- specify 'normal' for YouTube VGA aspect ratio (480 x 360) and 'wide' for YouTube HD (640 x 360) --> <span class="ytplayeraspect: normal"> <span> <!-- list video titles and identifiers here, play list rendered only if more than one movie --> <span class="ytmovieurl: XtFlYB56TZk">Interview with my daughter, Eva<span> <span class="ytmovieurl: QRS8MkLhQmM">YouTube Captions and Subtitles<span> <span class="ytmovieurl: _Tp6hgAEUiQ">Easy YouTube caption Creator<span> <span class="ytmovieurl: yvFbP82cYcs">Creating captions with CaptionTube<span> <span class="ytmovieurl: meCIER_s7Ng">Closed Captions<span> <div> As the code shows, the Available Controls get inserted where ever you put a div with class="ytplayerbox". The aspect ratio of the player can be set to “normal”, which renders the video at 480 by 360 pixels—the old, standard YouTube VGA-like ratio—or to “vast,” which renders the video clip at 640 by 360 pixels—the YouTube “HD”, “letterbox” ratio. That is done by using a special class on a span element, “ytplayeraspect: [wide or normal]”. You then tell the Accessible Controls how many movies you want. If you specify one, there is no play list area rendered. More than one will generate a play list for you. Specify each movie's title and YouTube identification code. The contents of the span become the title for the video (which, obviously, should be the title from YouTube, or something close to it). Put the identifier in class, “ytmovieurl: [YouTube identifier for a video]”. And that's it! Below are some examples showing the Accessible Controls in action. Player Example 1: Player with Five Videos in Play List Player Example 2: Broad Screen Player with Three Videos You can have many instances of the player on the same internet page. Below is an example that shows the player controls set to display video at the YouTube wide-angle, “HD” aspect ratio of 640 by 360 pixels. Player Example 3: Player with One Video clip Notice that if we have only one video clip, the play list does not display.