Captioning YouTube Video clip and Furnishing Available Controls A somewhat pain-free approach to have accessible, captioned video within your internet pages. — brought to you by the friendly people at The Ohio State University Web site Accessibility Center With a little bit of work, some complimentary online tools, and code and utilities on hand from this website page, you can provide your students, staff, and other users inside and outside the university access to web site video clip that is usable by everyone, including people with disabilities. On this page, we cover some effective methods for captioning and embedding YouTube video on your web site pages. We also describe and link to a tool for converting YouTube captions into formats suitable for use in other video clip players. Introduction: YouTube, Captioning, Access YouTube may not be the appropriate host for some materials over which the author wants or needs to maintain strict control due to intellectual property or privacy concerns. Also, YouTube videos are limited to 10 minutes. So, if you have longer material, you will need to segment the longer video clip into parts of 10 minutes or less. But if you want your video clip shared freely with the globe, it is hard to conceive of a better venue than YouTube. If you decide to use YouTube for your video clip, there are a couple of things you can do to ensure that the video is available to people with disabilities, who may have a difficult time operating the YouTube video player controls or who are deaf or hard-of-hearing. For people with motor disabilities or who rely on screen readers to access internet content, you can embed the video clip inside your internet pages in a manner that guarantees the playback controls are usable. For the deaf and hard-of-hearing, you can provide captions. Of course, captioning positively affects people beyond your deaf and hard-of-hearing users. YouTube supports subtitles—synchronized transcripts in languages different than the original audio—using the same mechanism as captions. Subtitles can facilitate communication with people across the globe, as well as help students trying to learn a foreign language. Captions can also help users in noisy venues (the gym?, a café?) or places where there is enforced quiet (libraries and computer labs,
Microsoft Office 2010 Key, for instance). And they can help both users with cognitive disabilities and non-native speakers by presenting content in multiple “modes,” both aural and textual. Finally, it is worth pointing out that The Ohio State University has a World wide web Accessibility Policy that requires all video clip that can be accessed by the general public to have synchronized captions. (Regarding non-public media, the policy states that video clip with a known, limited,
Microsoft Office 2010 Key, and secured viewership, such as video clip for courses requiring registration, internal OSU video clip demonstrations or staff-educational materials, etc., must be captioned in a timely manner, on request. Of course, we would prefer all video clip to be captioned.) So, not only are you doing a service to your viewers by captioning and delivering accessible controls, you are conforming to university policy. How to Caption YouTube Videos Probably the most time-consuming a part of captioning is getting an accurate transcript of the audio. The other parts of the process—editing the transcript and synchronizing it with the video—can also be difficult and labor intensive, though, as we will see, synchronizing the caption has gotten significantly easier since YouTube introduced automatic caption timing from the Fall of 2009. Though it can be time-consuming, after you have captioned one or two videos, a natural work flow and pacing develops, and you will gain proficiency and figure out which techniques work best for your particular situation. You may also decide that parts of one's process should be farmed out to staff, student employees, or pay-for services. In addition to covering the basics of the process, below we also try to outline what we think are good practices to follow and provide suggestions on software and mention a few helpful services. In outline, the steps in captioning YouTube video are: Get a transcript “Chunk” the trancript Synchronize the chunked transcript with the video clip Getting the Transcript To produce good captions for your video, you will need an accurate transcription of the spoken audio of your respective video clip. There are many ways to get a good transcript. One way is to use a transcription service. One transcription service we have heard good things about is Casting Words. A high-quality transcription with a six-day turn-around will cost $1.50 per minute—$90 an hour. Another way is to do the transcription yourself. This laborious process can be helped along with a couple of software programs. One program that may help is Express Scribe. Express Scribe is free and works in Windows and Mac. Express Scribe offers you a lot of control over audio playback. With it, you can use simple keystrokes to pause, play, and rewind in short increments. The Express Scribe player can be minimized and “pinned” so that it is always visible and its keyboard shortcuts always to choose from. Express Scribe requires you have audio in either WAV or MP3 format. So if you want to use it, you will need to extract the audio from your video. You can use the no cost, on-line service Media Convert to do that, or various video clip and audio programs, such as QuickTime Pro ($30), can extract audio. (If your video camera produces MOV files as output, QuickTime Pro is certainly worth having.) VLC Media Player is a free of charge program for Mac, Windows, and Linux that can playback and convert to and from a extensive variety of formats. It includes the ability to export WAV and other formats. A speech recognition program may help with transcription, as well. Speech recognition programs turn speech into text. If you are using a recent version of Windows (Windows Vista and later), you have on the market a very good speech recognition program, called Windows Speech Recognition. Earlier versions of Windows have speech recognition programs, but they are a bit clunky and not very accurate. The excellent speech recogntion in
Windows 7 may help significantly with transcription. It is equivalent in quality to commercial speech recognition programs, such as Dragon Naturally Speaking (Windows) and MacSpeech Dictate (Mac). All speech recognition programs, including Windows Speech Recognition, need to be trained to recognize your voice. Training typically takes only a half-hour or so. Once you have trained the speech recognition, transcription is a process of listening and using a headset mic to echo back the spoken audio. In our work transcribing using speech recognition, we have found that producing an accurate transcript typically takes between three and four times as long as the original audio. So, a half hour of video will take you roughly an hour and a half to accurately transcribe, once you get proficient using speech recognition with Express Scribe. This may seem like a long time. But try typing out a transcript manually, and you will see 3:1 is not that bad. If you have lots of specialized vocabulary or names, especially non-Western names, within your audio, speech recognition will go more slowly, and you will need to train the software to accurately transcribe the unusual words. Whatever tools you use to get your transcription, take your time and produce a highly accurate transcription. Accurate transcription and good synchronization are the cornerstones of quality captioning. “Chunking” the Transcript Chunking a transcript involves breaking it into lengths appropriate to be displayed in one pop-on caption. A chunk can be one or two lines of transcript. More than that and you start to get problems with how the captions display and readability is negatively affected. You will want to keep the lines less than 42 characters long for purposes of readability. Also, there are conventions that are used to identify speakers and “sound effects,” such as background sounds or to indicate how a thing is being spoken. For example, a sound effect might be something like “leaves rustle outside,” “music plays,” or, when indicating a speaker, “Bob [shouting].” Another common cue is to surround music with music notes. The Media Access Group at WGBH has a Captioning FAQ that provides some conventions for how to caption, though it is geared more for closed captioning for video and TV. The Described and Captioned Media Program (DCMP) has excellent materials on caption style, chunking, line division, and other conventions in their Captioning Key pages. In our example below, we follow YouTube's recommendations for preparing a transcript file, which appear to blend a number of conventions. The example below has some examples of good and bad chunking and shows how you can introduce a speaker and insert a “sound effect.” Note that line breaks should occur at logical places, so that each line is as semantically complete as possible to make for easy reading. Also try to get chunks to “feel” complete—for example, within the table below, we decide to break the chunks inside the song so that they match the singer's phrasing. Example of Good and Poor Transcript Chunking Good Poor >> VERBAL KINT: The greatest trick the devil ever pulled was convincing the entire world he didn't exist. >> VERBAL KINT: The greatest trick the devil ever pulled was convincing the earth he didn't exist. Take your stinking paws off me, you damn dirty ape. [grunting sounds] Take your stinking paws off me, you damn dirty ape. [grunting sounds] >> RICK ASTLEY [crooning]: ♪ Never gonna give you up, Never gonna let you down... ♪ ♪ Never gonna run around and desert you. ♪ >> RICK ASTLEY [crooning]: ♪ Never gonna give you up, ♪ ♪ Never gonna let you down, Never gonna run around and desert you. ♪ A Note on Audio Description DCMP also has very good information on audio description (AD), what they call video clip description, in their Description Key pages. An audio description is an audio-only track that runs synchronously with the main video audio and describes visual content, so that people who cannot see the video have the necessary context for understanding what is going on. In addition to the DCMP materials on audio description, Joe Clark has developed a set of standard techniques in audio description. Some things that might be described to enhance and clarify comprehension are: Opening titles, on-screen only text, and (to a reasonable limit) closing credits Mise-en-scène, scene changes, and costume or character appearance Actions and gestures Observable emotional states In general, try to speak descriptions when there is a pause inside primary audio track, but speak over the primary track when required to add to the understanding of the video clip. The narrator's voice should be able to be easily distinguished from the primary audio. If you need audio description for your videos, you will need to record the audio as a separate track and merge it along with your video clip in the video clip editing software. This is possible even with totally free and low-budget software, such as Windows Live Movie Maker (Windows only), Apple iMovie (Mac only), and Apple QuickTime Pro (Windows and Mac). Two Ways to Synchronize YouTube Captions From the sections below we discuss two ways to synchronize your chunked transcript along with your YouTube video to create a timed caption track. One way is to let YouTube's Automatic Timing facility attempt to automatically perform the synchronization. This may not always work. The audio with your video may be low quality or, for whatever reason, YouTube simply may not be able to produce adequately synchronized timings. Therefore we give another way, using an online service from Accessify called Easy YouTube Caption Creator. Synchronizing the Automatic Way: Using YouTube's Automatic Timing In Fall 2009, Google began incorporating into YouTube the Automatic Speech Recognition engine that helps power the transcription service in Google Voice. The first phase of this introduced Automatic Timing, which provides the ability to automatically synchronize your transcript along with your YouTube video clip. In first quarter 2010, YouTube rolled out Automatic Transcription, making it possible to generate a transcript of the video,
Office 2007 Product Key, and thereby automating the entire captioning process. In our experience, the Automatic Transcription facility is not capable in most circumstances of producing an accurate transcript. In cases where the voices in your audio are well recorded and the speakers speak very clearly, YouTube will likely produce a transcript that can be manually corrected, and you may save yourself some effort compared to producing a transcript from scratch. For the majority of cases, however, the machine-generated transcription will not be very good, and you are better off using only YouTube's Automatic Timing facility. Here are the steps to use YouTube's Automatic Timing to synchronize your chunked transcript: Sign in to YouTube and select “My Videos” under your account name. Locate the video clip you want to caption, and click its Captions button. Click the “Add New Captions or Transcript Button.” A new page loads. Click the “Browse” button. Locate your chunked transcript, and click OK. Under “Type”, select the “Transcript File” radio button and click “Upload File.” Once the Caption Track has uploaded and finished processing (synchronizing), make sure the checkbox next to it is selected and save your chages. That is all there is to it. Here are some examples that demonstrate usages of both Automatic Timing and Automatic Transcription. The titles inside video playlist below describe how the captions were made: You will notice that Automatic Timing very accurately synchronizes Paul Schindler's voice with the transcript. In our experiment, with mediocre audio and multiple speakers, alignment is mostly on target, except for a few instances in which Emily's voice is not matched properly. The Automatic Transcription of Paul Schindler is particularly good. Though not perfect, it produces a result that might be edited and corrected. This is in stark contrast to our mediocre audio example, which produces an unusable caption track. Synchronizing the Manual Way: Using Accessify's Easy YouTube Caption Creator If you cannot use YouTube's Automatic Timing to synchronize your chunked transcript, you will need to do it manually. The manual synchronization process involves playing back the video clip and marking the times at which each transcript chunk occurs inside the video. There are a number of tools that can help with this. NCAM's MAGpie, MovCaptioner, YouTubeCC, and CaptionTube are worth looking into. MAGpie is a tried and true, stand-alone Java application that you install on your Windows machine (the program does not currently support the Mac). MovCaptioner ($25) is made for Mac only and is one of the best products for that platform, outputting caption files in many formats, including SubRip. Both MAGpie and MovCaptioner allow you to either input caption lines as they play or import a chunked transcript. MovCaptioner has the advantage of playing back the video clip in short snippets (one to 11 seconds) to facilitate transcribing. YouTubeCC and CaptionTube are online applications. Both use a model for timing in which you type each caption chunk in individually, similar to modes on the market in both MovCaptioner and MAGpie. We find this procedure cumbersome, but it may work well for you. The service we recommend is Accessify's Easy YouTube Caption Creator. Like MAGpie and MovCaptioner, it allows you to import the transcript, already chunked and fully prepared. You playback the video inside the web application and set timings using a keystroke—simple and straight-forward. Here are the steps: Put from the URL of your YouTube video. Paste in your own chunked transcript. Start playback of the video clip. Click the “a” key when you hear the text with the caption chunk being spoken. When the video clip has run to the end, copy the timed-text output into a file on your computer. Note that Easy YouTube Caption Creator thinks of a chunk as a single line of the transcript. So, if you have multiple-line chunks, join them into a single line before pasting them in to Caption Creator. Clicking the “a” key sets a time for each caption chunk. When you have worked your way through the entire video, you can copy the timed-text output the Caption Creator has made to suit your needs. Paste it into a text file on your computer and name the file with the .sbv extension: my_caption_file.sbv, for example. Finally, you must upload your timed caption file to YouTube. YouTube makes it simple to upload your caption file and associate it along with your video. Sign in to YouTube and go to “My Videos.” Locate the video you want to add captions to and click its “Edit” button. Click the “Captions and Subtitles” link. On the “Captions and Subtitles” page, click the “Browse...” button, locate the caption file on your computer, and click the “Open” button. Back on the YouTube page, select a “Track Language,” English, for example, and click the “Upload” button. YouTube associates your caption file with the video. When you reload your video in YouTube, you will see that it has captions. You can upload more than one caption track. You can use this feature to upload tracks in another language—in which case you have created a “subtitle.” If you upload more than one track, note that the end user will need to be able to access the Flash controls in order to change the captionsubtitle track. Thus, in terms of accessibility, it may make sense to get more than one copy of the video clip on YouTube and associate just a single subtitle with each instance. Convert YouTube .sbv to SubRip, W3C Timed Text,
Windows 7, and QT Text Having captions in YouTube is wonderful. But it would be even better if we could use the timed captions in YouTube for other services or for hosting our private video. The problem, of course, is that not all video players or services accept Subviewer formatted captions. We have written a YouTube caption converter that will convert YouTube Subviewer format to SubRip, W3C Timed Text Markup Language (DFXP), and QT Text. W3C Timed Text is used in a number of players including Adobe's video clip component for Flash and the popular JW Player. And QT Text is the format used to caption QuickTime MOV files. Now you can download your timed captions from YouTube, convert them, and re-purpose them for use elsewhere. Available Controls for the YouTube Embedded Video Player The YouTube video player is used everywhere. YouTube is a boon for discovering and broadcasting video on the web and is widely used in education. The player itself is implemented in Flash, which allows for high quality video and sound and attractive interface controls. As discussed above,
Office 2007, the Flash-based YouTube player allows for captioning and subtitling of video clip, which is a great benefit for many reasons. One problem with Flash, however, is that in many browsers it is not accessible to the keyboard alone. Another problem is that screen reader programs for the visually impaired cannot always accurately discern the function of controls implemented in Flash, and some screen readers cannot access Flash controls at all. For example, all browsers running in MS Windows except Web Explorer cannot get focus to a Flash movie using the keyboard alone—a user must hover over a movie with the mouse and click. Tabbing into the movie is not possible. Once inside Flash movie controls for the YouTube video, all browsers other than IE are “trapped”, tabbing through the player controls perpetually, unable to access any other parts of the web page. The problem is similarly difficult for screen reader users. A portion of experienced users of screen readers will know that they must turn off their screen reader's regular page browsing mode and go into a “pass-through” mode to be able to read the buttons from the YouTube player, but even then the only usable buttons within the Flash-based YouTube player are Play and Mute. And if you happen to be using VoiceOver, the screen reader in Mac OS X, Flash controls are inaccessible. Thus, for keyboard- or screen reader-reliant users YouTube can present a difficult situation. (For more information on this problem, see our write-up on Flash accessibility in JW Player Controls.) Suffice it to say that HTML-based controls are preferable for accessibility. All browsers can access HTML-based controls and there is no need for mode-switching. This is where our Available Controls for the YouTube Embedded Video clip Player can come in handy: The Controls provide HTML links to start, pause, stop, jump backward and forward, adjust and mute the volume, and loop a video. The Controls provide HTML headings that help a screen reader user determine where the controls are, what the name of the currently playing movie is, and how much time has elapsed in playback. The Controls set up the video clip so that if there are captions they display by default, and, if you have high-quality video, that will also display by default. Finally, the Controls facilitate adding a play list and displaying it in an available manner. Get the Code for the Available Controls You can download ytp.zip, which contains the JavaScript code that generates the player controls, a sample page that demonstrates usage, and a sample stylesheet that you can modify for use on your personal pages.. Because of the nature of Flash embedding, to see the demo in action you will need to view the sample page in a website server. Configuring the Accessible Controls Embedding a YouTube video in a website page is a simple matter of copying the “embed” code from YouTube. However, what you get on your website page has problems in terms of accessibility, as outlined above. Also, the embed code, itself is quite ugly and hard to edit, if you want to change any of the parameters. And if you want to encapsulate the embed so that it includes controls in HTML to help with accessibility or add a play list, you will be dropping in even more hard to maintain code. By contrast, using the Available Controls is simple and maintainable. Including available YouTube with your web site pages requires adding a couple of lines from the head and inserting one or more div elements with a class of “ytplayerbox” inside body your world wide web page. JavaScript takes care of rendering out the player and controls to suit your needs. Code Example: Including the JavaScript and Style Sheet The following code goes inside the head of one's document. <!-- Stylesheet and JavaScripts for Available Controls for the YouTube Embedded Video clip Player --> <link rel="stylesheet" type="textcss" href="cssytp.css" > <script type="textjavascript" src=" <script type="textjavascript" src="jsytp.js"><script> Above we reference the style sheet and the JavaScript for the controls. We also need to pull in SWFObject, which is used for dynamic, browser-independent Flash movie embedding. The Accessible Controls for the YouTube Embedded Video Player makes the process of embedding controls and a play list very simple and easy to maintain. The code below demonstrates how you would add a video clip with accessible controls and a play list totalling five videos. Code Example: Player Configuration <!-- This is where the player, buttons, and (optional) play list get rendered --> <div class="ytplayerbox"> <!-- specify 'normal' for YouTube VGA aspect ratio (480 x 360) and 'wide' for YouTube HD (640 x 360) --> <span class="ytplayeraspect: normal"> <span> <!-- list video titles and identifiers here, play list rendered only if more than one movie --> <span class="ytmovieurl: XtFlYB56TZk">Interview with my daughter, Eva<span> <span class="ytmovieurl: QRS8MkLhQmM">YouTube Captions and Subtitles<span> <span class="ytmovieurl: _Tp6hgAEUiQ">Easy YouTube caption Creator<span> <span class="ytmovieurl: yvFbP82cYcs">Creating captions with CaptionTube<span> <span class="ytmovieurl: meCIER_s7Ng">Closed Captions<span> <div> As the code shows, the Accessible Controls get inserted where ever you put a div with class="ytplayerbox". The aspect ratio of the player can be set to “normal”, which renders the video clip at 480 by 360 pixels—the old, standard YouTube VGA-like ratio—or to “extensive,” which renders the video at 640 by 360 pixels—the YouTube “HD”, “letterbox” ratio. That is done by using a special class on a span element, “ytplayeraspect: [wide or normal]”. You then tell the Accessible Controls how many movies you want. If you specify one, there is no play list area rendered. More than one will generate a play list to suit your needs. Specify each movie's title and YouTube identification code. The contents of the span become the title for the video clip (which, obviously, should be the title from YouTube, or something close to it). Put the identifier in class, “ytmovieurl: [YouTube identifier for a video]”. And that's it! Below are some examples showing the Accessible Controls in action. Player Example 1: Player with Five Videos in Play List Player Example 2: Vast Screen Player with Three Videos You can have many instances of the player on the same world wide web page. Below is an example that shows the player controls set to display video at the YouTube wide-angle, “HD” aspect ratio of 640 by 360 pixels. Player Example 3: Player with One Video clip Notice that if we have only one video, the play list does not display.