Today we are going to look at how to transcribe a foreign language video. But how does this work when we live in a globalized world where people are from many different places - exactly what is foreign?
Well, QBL Media’s platform provides a client with the ability to transcribe and translate content to/from a
number of languages, as discussed here.
What we are looking at today, is
how to transcribe a video in an language you do not know. Mostly, we will make changes to a template, and implement these changes in our items. Here, we have
Hindi captions on a speech by Narendra Modi, the current Indian Prime Minister. The assumption is,
we do not know Hindi.
no edits have been made to the underlying text, not
after the transcription, or after the translation, hence the accuracy is low. As always, we recommend using the
AI to do the heavy lifting, and using a human subject-matter expert to clean up the results.
use a subject matter expert and
do not cheap out on translators. This tool is built
to increase efficiency, not short change staff/contractors.
It is important to understand what we are doing. We
want to transcribe the speech, but we do NOT know Hindi, hence cannot translate this content. So we will
pretend to translate our template, but actually simply
set it up to expect Hindi content.
Then we add the Hindi content, but because the original template was in English, we can bring back the content (
translate back from Hindi -> English). Effectively a
four transform process, English template -> Hindi template -> Hindi item, and lastly because the original template was in English, we
can take our Hindi item -> English item.
It may be useful to also look at this post to understand the strategy we are using. We will be modifying our templates to work with multiple languages, its just that in this case we do not know the language we are trying to use. The video content of the original can be seen below - the asset is publicly available content sourced from Rajya Sabha TV.
Steps - Template
Please direct your browser to www.qbl-media.com and log into the application. Once logged in, open up a new template, or just use our old friend,
myTemplate. Note, we are in a collection named
Sydney- your collection is likely to be called
MyTeam. Please have a look at account settings to make any changes.
myTemplate, and check the
Language Optionsto see something similar to the below image. In collection
Sydneywe have access to several languages including
Hebrew. We will only be working with English (our primary language) and Hindi (one of our secondary languages) in this post.
Now, we are not looking to overlay any audio, so
we can remove the audio component. However, we will
translate it for illustration purposes, showing how to use the translation function on templates.
translate this template into Hindi, click the
Action -> Translatefunction. Now,
do not select the video component, but
only select the audio component. We do not need the audio component, but are translating it anyway to
show what the Hindi itemwill look like.
This should give us a new Hindi template, which looks like the below. Note, the
video component is NOT translated, while the
audio component IS translated. We will not be using the audio component in this post, but any item of type
hi_myTemplatewill show the
video component in English, and the audio component in Hindi.
Note that in the
Language Optionsabove, this template shows
Hindias its primary language, and
Englishas secondary languages. That is, items of this template are
expected to be in Hindi, and can we translated to Englishas required.
Steps - Item
setup our Hindi templatefollowing the steps above, now we can use this to reach our goal of
transcribing the Hindi video. Add a new item called
modi_hindiand upload the content as required. The below image shows what our
new item looks like prior to uploading the video.
Again, we do not need the audio component. The
only reasonwe translated it, is to show how
we can selectively translate specific componentsdepending on our workflow. We will only be using the video component.
Upload the video, and then transcribe to get the captions as required. This looks like below.
We can now see the captions have been populated. Note,
general workflow would include using a Hindi subject matterexpert to clean up the captions.
In this post, we have
NOT cleaned up the Hindi transcription. This is
BAD example, because as a political speech, the majority of it is using humour, in jokes etcto reach the audience - none of which is a strong point for an AI. AI works best with
thingtype content. The more subtle the context, the less likely an AI is to do work properly. Please use
subject-matter expertswith content of this nature.
Below, the translated version (into English) is shown, and watching it will show the errors. The error compounds, because (i) our transcript is not great, (ii) our translation was based on the same dodgy transcript.
In this blog post we have covered how to transcribe a video in a foreign language. We specifically used
this example, to show the limitations of AI. As always, use
the AI to do the heavy lifting, but get a
subject matter expert to think about your content and finesse the results to the needs of
The platform is currently in closed beta, while we work out bugs in the code. If you are interested in trying out our technology, please drop us an email at firstname.lastname@example.org.