This document will try to explain as much about the YouTube CC Downloader as possible. From the way it all works to how best to use it to get the most from it.
Go to: YouTube CC Downloader.
Development History
Things to complete:
The optional choice to download all the closed captions in handy .rar or .zip form. Perhaps show the title and the video
thumbnail. Updating the database properly (hit counting, title, user).
Because this tool is meant to help people access YouTube CC tracks a bit easier, what's a better way to achieve that by providing a nice easy-to-use API?
Because there can be multiple CC tracks per YouTube video, there are two main steps to accessing them using the API. This is the same for communicating with YouTube directly, except I hope this API provides a much more user-friendly experience. The first step is to see what CC tracks are available for the video. The second step is to download whichever track you so desire.
The first step:
You need to see what CC tracks are available for the YouTube video. You may do
this by accessing the following URL:
http://www.nitrxgen.net/youtube_cc/AuX7nPBqDts.csv
You can replace AuX7nPBqDts
with any YouTube video
ID provided that the video exists, is not set to private, and actually possesses
CC track data. Additionally, you can replace
.csv
with
.tsv
,
.xml
, or
.json
to get the available track data in different formats.
The most important thing about this first step is deciding which CC track you want, and using the associated track ID for the second step. I'm relying on the assumption that you know where to find the track ID once you've requested this URL; if you do not, then I don't know what you're doing here.
The second step:
Using the data retrieved from the first step (using primarily the track ID), you
can download the CC track directly using the following
URL:
http://www.nitrxgen.net/youtube_cc/AuX7nPBqDts/9.csv
You'll notice only a minor addition with this URL: The video
ID AuX7nPBqDts
is a
directory, and 9.csv
appears to be a file. The file's name 9
is actually the track
ID you're requesting. Once again, you can replace
.csv
with
.sbv
,
.srt
,
.tsv
,
.txt
,
.vtt
, or
.dfxp
to retrieve the data in different file formats.
Things to note:
I must point out that some of these different file format specifications allow for different features such as text
formatting, on-screen text positioning, and extra attributes such as spoken and non-spoken event data (like attributing
the name of the person speaking in particular, screaming, doors slamming, noises off-camera, background music
information, etc.). Unfortunately these extra things are not available from the data YouTube provides.
I should also point out that some CC tracks may not have correct timing information used, or may equal 0.000 seconds for start and finish for all lines, or something similar. This is not an error with this tool, that is the author of the CC track not applying proper care to ensure the track has the correct times. There is nothing I can do about these as this is what YouTube also provides. Try enabling the CC track in the actual YouTube video and see if they actually work; for me, they do not.
Problems on YouTube:
Their language code BH
(for Bihara) does not actually have Bihara
in the track list of the
API; it only returns Bh
for both the original language name and the translated language name. The Chinese
languages in the user-friendly YouTube list do not match those found in the track list of the API either. When
attempting to add captions to a video using the Moldovan
language, it allows you to enter or upload caption
data, but upon returning to the language list to add more translations, you'll find Romanian (Moldova)
instead, and it thinks captions for Moldovan
hasn't been published. Because of this mismatch in
identification, when trying to edit Romanian (Moldova)
captions, you get a YouTube error and subsequently
you can't remove those captions. In general, a couple of YouTube languages are actually the country names and not the
languages (Nauru should be Nauruan, Quechua should be Quechuan but is a collection of languages). The Haitian language
is not visible to add (for me, at least) even though I've seen Haitian captions added to videos.
Serbo-Croatian
becomes Serbian (Latin)
which already exists as itself, but at least you can
delete that one. Tagalog
becomes Filipino
which is unnecessary. Twi
becomes
Akan
and then is unable to be edited or deleted. I've seen Scots
listed by YouTube videos, but
only Scottish Gaelic
appears, which uses a different language code.
The information in this section is up to date as of 26/01/2017.
I did originally intend to provide automatically generated captions when I launched this tool. Sadly, it appeared a bit more complicated than I imagined and simply didn't bother working on it. In reality, it's a whole different tool in itself due to the complexity. I do still plan to provide automatically generated closed captions at some point but I can't say when. One day I'll wake up and work on it. That's all I can say.
There may be a number of things. Check that the video exists. If it's your own video, make sure the privacy settings are not set to private. If it's not your video, perhaps there may in fact be no closed captions available for that video. It is possible that maybe regional restrictions can apply; videos that cannot be shown outside of the U.S. for example may also block access to the accompanying closed captions but I've not yet tested this. If it continues to fail and you are sure there should be closed captions to view then please let me know and be sure to include the video URL you're entering so that I can re-create the issue.
Statistics are currently disabled due to the rather large sample size collected over the years. A new way to collect statistics will have to be developed. Sorry in advance!