How To Transcribe A Foreign Language Video

How To Transcribe A Foreign Language Video

> How To Transcribe A Foreign Language Video [January 22, 2019]

Introduction

Today we are going to look at how to transcribe a foreign language video. But how does this work when we live in a globalized world where people are from many different places - exactly what is foreign?

Well, QBL Media’s platform provides a client with the ability to transcribe and translate content to/from a number of languages, as discussed here.

What we are looking at today, is how to transcribe a video in an language you do not know. Mostly, we will make changes to a template, and implement these changes in our items. Here, we have Hindi captions on a speech by Narendra Modi, the current Indian Prime Minister. The assumption is, we do not know Hindi.

Note, no edits have been made to the underlying text, not after the transcription, or after the translation, hence the accuracy is low. As always, we recommend using the AI to do the heavy lifting, and using a human subject-matter expert to clean up the results.

No really, use a subject matter expert and do not cheap out on translators. This tool is built to increase efficiency, not short change staff/contractors.

Narendra Modi speech with English captions

Context

It is important to understand what we are doing. We want to transcribe the speech, but we do NOT know Hindi, hence cannot translate this content. So we will pretend to translate our template, but actually simply set it up to expect Hindi content.

Then we add the Hindi content, but because the original template was in English, we can bring back the content (translate back from Hindi -> English). Effectively a four transform process, English template -> Hindi template -> Hindi item, and lastly because the original template was in English, we can take our Hindi item -> English item.

It may be useful to also look at this post to understand the strategy we are using. We will be modifying our templates to work with multiple languages, its just that in this case we do not know the language we are trying to use. The video content of the original can be seen below - the asset is publicly available content sourced from Rajya Sabha TV.

Narendra Modi original speech

Steps - Template

  1. Please direct your browser to www.qbl-media.com and log into the application. Once logged in, open up a new template, or just use our old friend, myTemplate. Note, we are in a collection named Sydney - your collection is likely to be called MyTeam. Please have a look at account settings to make any changes.

  2. Open up myTemplate, and check the Language Options to see something similar to the below image. In collection Sydney we have access to several languages including English, French, Hindi, Chinese, Portuguese and Hebrew. We will only be working with English (our primary language) and Hindi (one of our secondary languages) in this post.

    Template showing Language Options - English/Hindi

    Template showing Language Options - English/Hindi

  3. Now, we are not looking to overlay any audio, so we can remove the audio component. However, we will translate it for illustration purposes, showing how to use the translation function on templates.

  4. First, to translate this template into Hindi, click the Action -> Translate function. Now, do not select the video component, but only select the audio component. We do not need the audio component, but are translating it anyway to show what the Hindi item will look like.

    Translate the audio, but not the video

    Translate the audio, but not the video

  5. This should give us a new Hindi template, which looks like the below. Note, the video component is NOT translated, while the audio component IS translated. We will not be using the audio component in this post, but any item of type hi_myTemplate will show the video component in English, and the audio component in Hindi.

    Hindi Template, with Audio component translated

    Hindi Template, with Audio component translated

  6. Note that in the Language Options above, this template shows Hindi as its primary language, and Hindi and English as secondary languages. That is, items of this template are expected to be in Hindi, and can we translated to English as required.

Steps - Item

  1. We setup our Hindi template following the steps above, now we can use this to reach our goal of transcribing the Hindi video. Add a new item called modi_hindi and upload the content as required. The below image shows what our new item looks like prior to uploading the video.

    Add a new item of type: hi_myTemplate

    Add a new item of type: hi_myTemplate

    Item, with video in English and audio in Hindi

    Item, with video in English and audio in Hindi

  2. Again, we do not need the audio component. The only reason we translated it, is to show howwe can selectively translate specific components depending on our workflow. We will only be using the video component.

  3. Upload the video, and then transcribe to get the captions as required. This looks like below.

    Narendra Modi speech with Hindi captions

  4. We can now see the captions have been populated. Note, general workflow would include using a Hindi subject matter expert to clean up the captions.

    Hindi captions with font, colour and size options

    Hindi captions with font, colour and size options

  5. In this post, we have NOT cleaned up the Hindi transcription. This is BAD example, because as a political speech, the majority of it is using humour, in jokes etc to reach the audience - none of which is a strong point for an AI. AI works best with thing type content. The more subtle the context, the less likely an AI is to do work properly. Please use subject-matter experts with content of this nature.

  6. Below, the translated version (into English) is shown, and watching it will show the errors. The error compounds, because (i) our transcript is not great, (ii) our translation was based on the same dodgy transcript.

    Narendra Modi speech with English captions

Conclusion

In this blog post we have covered how to transcribe a video in a foreign language. We specifically used this example, to show the limitations of AI. As always, use the AI to do the heavy lifting, but get a subject matter expert to think about your content and finesse the results to the needs of your audience.

The platform is currently in closed beta, while we work out bugs in the code. If you are interested in trying out our technology, please drop us an email at support@q6a.com.au.