Below is a broad outline of how vidDistill goes from YouTube video and captions to a shortened video. The code is available on Github.

  • Retrieve
    vidDistill first gets the video and captions from YouTube based off of the URL the user enters. The caption text is annotated with the time in the video the text corresponds to. If manually provided captions are available, vidDistill uses those captions. If manually provided captions are not available, vidDistill tries to fall back on automatically generated captions. If no captioning of any sort is available, then vidDistill will not work.
  • Punctuate
    Not all captions are punctuated properly. In the case that the captions were automatically generated by YouTube, there is no punctuation available at all. We string all the captions together and try to restore punctuation as best as possible.
  • Summarize
    Once vidDistill has the punctuated text, it uses a text summarization algorithm to identify the most important sentences of the entire transcript of the video. The text summarization algorithm compresses the text as much as the user specifies.
  • Synthesize
    vidDistill then takes the condensed text and matches it up with the correct times in the transcript, allowing it to find the places in the video to jump between, in order to deliver a "shortened" video that encapsulates the main idea of the original video.