I have a video of people talking. I also have a transcript. I chunked the words into sentences so that I could display 1 sentence at a time on the screen, like normal subtitles in a movie. To do so, I created a csv where there is a row for every frame, and every row contains the full sentence during that sentence time chunk. This way I loop over all frames and put text for the sentence on every frame within that sentence. I do it in OpenCV.
sample transcript csv:
frame sentence 0 hello 1 hello 2 how are you 3 how are you 4 how are you 5 how are you 6 how are you 7 how are you 8 fine ...
The csv is the same length as the number of frames in the video. To draw subtitles, I do this:
import cv2 import pandas as pd df = pd.read_csv('data.csv') video = cv2.VideoCapture('vid.mp4') num_frames = video.get(cv2.CAP_PROP_FRAME_COUNT) assert len(df) == num_frames for i in list(range(0, num_frames)): ret, frame = video.read() cv2.putText(frame, str(df.sentence), (0,50),cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True) # additional standard cv2 code below...
This works, but now I don’t have any audio. I understand OpenCV does not work with any audio, but are there any other workarounds? This approach works well in my pipeline, so I’d like to be able to write these frames to a new video but keep audio while using as little additional libraries as possible.
Source: Python Questions