Curious PM Project

Project Flow Diagram

Project Flow Diagram

Overview

This project focuses on improving the quality of audio within videos by extracting the audio, converting it into text (transcription), correcting grammar, and eliminating filler words using Azure OpenAI. After cleaning the transcript, it is converted back into audio and synchronised with the original video. The most challenging part of the project was remapping the newly generated audio back onto the original video without causing any synchronisation issues, ensuring there was no delay or mismatch between the video frames and the new audio.

Problem Statement

When processing video content to improve its audio quality, there are several key challenges:

Solution Process

  1. Audio Extraction: The first step involved isolating the audio from the video. This audio track was then prepared for further processing.
  2. Speech-to-Text Conversion: The audio was converted into a text transcript using the Deepgram API. This transcript formed the basis for the next steps in the process.
  3. Text Cleaning and Grammar Correction: Azure OpenAI was used to clean the transcript by correcting grammatical errors and removing common filler words like “uh,” “um,” and “hmm.” This resulted in a more professional and clear transcript, which could then be converted back into speech.
  4. Text-to-Speech Conversion: The cleaned transcript was converted back into an audio format using text-to-speech technology (Deepgram API). This generated the new, corrected audio track, which was now shorter due to the removal of filler words and pauses.

Tackling the Most Challenging Part: Audio-Video Synchronisation

The Core Challenge:

After correcting the transcript and removing filler words, the new audio became shorter than the original, creating a significant issue in ensuring that the new audio would align perfectly with the video. This required careful attention to maintain synchronisation between the speech and video frames, especially to avoid any lip-sync mismatches.

Solution Approach:

Outcome:

This method ensured that the new audio fit seamlessly with the original video without introducing any delays or synchronisation issues. By breaking down the audio into sentence chunks, it allowed for precise alignment and perfect timing, resulting in a natural and polished video output.

Conclusion

This project demonstrates a complete solution for enhancing the audio content of videos while addressing the critical challenge of maintaining synchronisation between the modified audio and the video. By leveraging Azure OpenAI to clean up the transcript and applying advanced techniques to remap the new audio to the original video, the project improves the clarity and professionalism of spoken content in videos. This solution is particularly useful for content creators and educators looking to enhance the quality of their video content without sacrificing synchronisation between the audio and the visual elements.