2015-08-10

Semi-automate timing generation method of video subtitles

Abstract

I voluntarily work on for free mathematics material translation for everyone. I have three main tasks in my workflow of this work: 1. script translation on a srt file, 2. dubbing the video, 3. subtitle generation. I found the subtitle timing generation is a time consuming task, so I want to reduce this. When I generate a subtitle, I already have the translated script and its video sound. So, I try to use these data to semi-automate the subtitle timing generation. This time I use the YouTube's transcript function to generate the subtitle timing. This can reduce the time of timing generation task. I implemented a srt file to text file conversion script since YouTube's transcript function requires text format data. YouTube's transcript function performs  not only the timing generation, it also edit the lines (put some newlines). Therefore, I implemented subtitle line concatenation script, too. One experiment shows that whole manual work took 4.5 hours to generate the subtitle timing for a 13 minutes video. With this method took around 3 hours to generate for similar length video. These scripts are published with new BSD license, so anyone can use freely.

Semi-automate timing generation method of video subtitles

I voluntarily work on for free mathematics material translation for everyone. These videos explain ``why a fraction division make the fraction upside down and multiply it?'' or ``In the first place, what  the meaning of division by a fraction? I know what is divided by  means, but what is the meaning of divided by 2/3?'' I also  subtitles for these videos, but it took a lot of time to generate subtitle timing. For example, once I took four and half hours to generate the subtitle timing for a 13 minutes video.

I voluntarily work on for free mathematics material translation for everyone. These videos explain ``why a fraction division make the fraction upside down and multiply it?'' or ``In the first place, what  the meaning of division by a fraction? I know what is divided by  means, but what is the meaning of divided by 2/3?'' I also  subtitles for these videos, but it took a lot of time to generate subtitle timing. For example, once I took four and half hours to generate the subtitle timing for a 13 minutes video.

But, when I generate the subtitle timing, I have already a translated script and the voice. I try to use these data to generate the subtitle timings as much as possible. This is a SubdayResearch theme this time.

One of my friend made a software that analyze mp3 file by FFT and get the rhythm from the file. He has an input device of a game, ``dance dance revolution'', however, he didn't have the game software. So, he wrote a game software to use the device. I first thought I needed to analyze the video file to generate the timing. Thus I discuss with him. However, he suggested me that first I should search such software, maybe I could find some free software to do that. In my case, I need Japanese voice analysis.

I search a subtitle generation software, and found some including YouTube's functionality. I found a software that generates many language subtitles. I read the document of it and found this software first generate the video language's subtitle by YouTube's automatic subtitle generation functions, then uses the Google translate to generate the other language subtitles.

As an experiment, I tried YouTube's automatic subtitle generation function. But, I could not get enough precise result by my voice. The precision of the timing seems fine, but the text quality is not. However, I only need the timing information, since I have already translated script. If I could map only the timing of automatically generated subtitle to the manually translated script, it would work. So, we have the following ideas:
  • Can we search the corresponding strings between automatically generated subtitle and the manually translated script assuming some amount of error in the strings?
  • If we have corresponding points, can we minimize the distance to compensate the errors? This could be an optimization problem respect to the string distance.
However, when I checked some automatically generated subtitles, the strings have too much errors and it seems difficult to use this idea.

I assume voice analysis is a difficult task, so I try to avoid to do that. I would like to solve my problem as less effort as possible. Though, I will put some effort if it is really needed.

I continue the discussion a few times in our lunch break (The discussion of SubdayResearch is usually at lunch break or in a party), we realized that my real problem is not the subtitle generation, but the subtitle timing generation. So, I search again with ``subtitle timing generation,'' not ``subtitle generation.'' Then, I found that YouTube has transcript function. This function generates a subtitle from the video and its contents text. Currently 10 languages are supported to generate the timings by YouTube.

The input of the transcript is a text + alpha. I manually generated a text file from a srt file and tried this function. The result is enough precise to use. However, the text is cut sometimes, maybe it try to fit the text in some length. I need to remove this for the further processing. At the end, I fine turn the result by amara or Camstasia (both are the software that can adjust the subtitle timing manually).

In the end, I need two simple filter scripts for my workflow.
  • A filter converts a srt file to text format
  • A filter removes the subtitle newlines in a  srt file

Implementation

These filters a published at the following URL:

The license is new BSD license, so everyone can use freely.

Experimental result and conclusion


I made two videos, both length are around 13 minutes about multiplication table. I generate the subtitle by fully manually and it took 4 hours 27 minutes for the first video. I use this method for the second video, it took 2 hours 37 minutes for the second video.

I think one and half hours is a good time reduction for this video.

Future work


I would like to try this method for further video creations.

I also look for any other (simple/easy) methods to reduce video creation
time.

Acknowledgments

Thanks for Dietger, Daniel, Jörg for discussions and ideas.


No comments: