ffmpeg: Capture & encode a 4K stream in realtime using VAAPI
I am a wonnabe hobbiest filmmaker. Often, I only noticed problems with the raw footage on my Desktop computer at home - long after I could correct them by a simple retake. Wouldn't it be awesome to record your footage directly to your laptop? Make the footage available to others in your team really quickly, maybe by storing it on a server?
I want to do this on my XPS 13 since that is my most portable option. I should note that my laptop is decent (Intel i7-8550U 1.8 GHz) but doesn't have a discrete graphics card. In fact, I cannot even play back the 4K footage recorded on my Fuji XT-3 with HEVC (h265) on my laptop without tweaking VLC. And even on my hefty AMD Ryzen Threadripper 2950X 16-Core, I cannot encode on the CPU alone in realtime.
Therefore, encoding in realtime should probably be impossible. Or not? I had to try. Oh, and on Linux. NixOS to be specific.
Note that this article is not full of deep insights but mostly contains things I tried and then eventually worked. This kind of content has helped me tremendously in the past, so let's dump it into the Interwebs.
The easy way: OBS
The first thing that I tried was OBS studio. Here is what somewhat worked for me:
- I set the output of my camera to 4K, 16:9, 25 fps (full HD does not work)
- I added a
Video Capture (V4L2)
source. I needed to set the video format toYV12
which is apparently an alias forNV12
. - I added a
Audio Input Capture (Pulse Audio)
since the video source does not include the audio.
It somewhat worked. But it randomly stopped working. It dropped frames -- that got better when I disabled the preview. Anyways, if you reading this some time after I published this, give it a try, it might be the simplest option!
Therefore, I decided to use CLI tools to
understand what was going on and to ultimately have more control. I dabbled a
bit with gstreamer
but ultimately chose ffmpeg
because I found more and
better documentation for it and it seemed to work better.
Digging deeper: Required Software
If you want to follow along, I installed these packages in NixOS by adding them to my package list:
# Capturing and encoding
ffmpeg-full # My version was 4.2.2
# Inspect video devices etc.
v4l-utils
usbutils
# Hardware acceleration for encoding/decoding
vaapiIntel
libva
libva-utils
Underlying video device
ffmpeg
and other CLI tools require us to specify the video device name -- at
least if you have multiple video inputs e.g. the built-in webcam and the Elgato
Camlink 4K. You can either use the /dev/videoX
device name that you saw in OBS
Studio or find the device names with:
v4l2-ctl --list-devices
Cam Link 4K: Cam Link 4K (usb-0000:39:00.0-1.2):
/dev/video1
/dev/video2
You can already use the device names (the first one for the Cam Link) that you
found here. The trouble is that these device names might depend on the order in
which the video devices are detected. To get a stable device name that works
across reboots in you shell scripts, you can look into /dev/v4l/by-id
directory to find a good stable device name. For my devicem it was
/dev/v4l/by-id/usb-Elgato_Cam_Link_4K_0004550EAA000-video-index0
.
We'll refer to whatever device you have chosen with V4L_DEVICE
later in the
text.
Playing the video feed
To check whether ffmpeg can play the stream, you can use:
ffplay -f v4l2 -i $V4L_DEVICE
That is super simple, isn't it? It even detects the video format correctly. If
it doesn't for you, try specifying one with -input_format
, e.g. -input_format nv12
before the -i
.
Capture the video feed
What does ffmpeg
do by default? Let's see by exchanging ffplay
with ffmpeg
and specifying an output file:
ffmpeg -f v4l2 -i $V4L_DEVICE capture.mkv
Hey, that works! But only barely.
- I cannot encode the video in real time and the video is buffered. The raw video feed with 29.97 fps clocks in at about 3Gb/s, a bit less for 25 fps. If we can't keep up, ffmpeg buffers the rest in main memory. So only suitable for quite short videos ;)
- Also, the output is ugly, we need to specify something nicer.
Hardware-encoding to the rescue
Since my laptop does not have a discrete graphics card, I am using the "VAAPI" standard which is supported by newer Intel CPUs.
ffmpeg -vaapi_device /dev/dri/renderD128 \
-f v4l2 -input_format nv12 -i $V4L_DEVICE \
-vf 'hwupload' \
-c:v hevc_vaapi -b:v 100M -maxrate:v 120M capture.mkv
Woooh! That's a lot to understand.
Let's start with the basics: A basic ffmpeg command consists of a some global
flags (-vaapi_device /dev/dri/renderD128
in this case) list of inputs (-i ...
) and a list of outputs (e.g. file names). Local options effect the next
input or output, so decoding options are specified before an input, encoding
options for the output are specified before the output file name. -f v4l2 -input_format nv12
apply to the input. -c:v hevc_vaapi -b:v 100M -maxrate:v 120M
all apply to the output capture.mkv
.
Between our input and the output, there is a -vf
video filter command which
can be used to process the raw video. You could scale the video, transform the
color space, things like that. In our case, we need to make the video stream
available in the hardware encoder with hwupload
. That seems to a bit of a
specialty with the VAAPI support. In my short experimentation with the NVIDIA
support (nvenc
) on another machine, I didn't need something similar.
-c:v hevc_vaapi
is used to choose the video codec (-c
is for codec, :v
is
for video) from a long list of supported once. The codecs that use VAAPI have
a _vaapi
suffix by convention.
Without specifying the wanted and maximum bit rate, I got a stream with low
quality. Therefore, I specified -b:v 100M -maxrate:v 120M
to aim at quite high
quality. The highest quality setting on the X-T3 for 4K 25fps is 400Mbs, so it
might be even worth to experiment with higher settings. Note that there is a
bug that requires you to specify a
maxrate that is not much higher than the wanted bit rate.
Using VAAPI, I could encode 25 fps in realtime. Unfortunately, 29.97 is already too much. That unfortunately also means that my plan to encode a smaller, low-quality stream at the same time, also failed.
The FFMPEG VAAPI Wiki Page gives context and many useful examples if you want to know more.
Audio source
The audio feed is - unfortunately - separate from the video feed.
First, let me save you a day of fiddling. Don't use ALSA directly, use PulseAudio directly if you use it on your system. Using ALSA directly from ffmpeg resulted in clapping a lot to debug audio sync issues and thus driving my girl-friend insane.
Use pactl list sources
to output tons of information about your audio sources.
Choose the right device and pick the name. For me, it was
alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo
.
Capture the audio feed
This is straight-forward, we just have to specify that we want to use PulseAudio
(by -f pulse
) and specify the device name as input.
ffmpeg -f pulse -i alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo capture.mp3
We skip compression to save some CPU when we do everything together later on:
ffmpeg -f pulse -i alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo \
-c:a copy capture.wav
-c:a copy
set the audio codec to copy
which skips any audio encoding. In
this case, it is redundant since the output format doesn't support any special
codec anyways. It becomes relevant when we combine it with video.
Putting it together... doesn't work
This mostly combines our video/audio examples and adds a -thread_queue_size 2048
to allow some generous buffering:
fmpeg -vaapi_device /dev/dri/renderD128 \
-i $V4L_DEVICE \
-thread_queue_size 2048 -f pulse -i $PULSE_DEVICE \
-vf 'format=nv12,hwupload' \
-c:v hevc_vaapi -b:v 100M -maxrate:v 120M \
-acodec copy \
output-combined.mkv
This should be the final section, right? I'd hope so but for me this wasn't true at all.
This might work for you. For me, it also worked once or so. Unfortunately, most of the time it will get stuck after two frames. After some random experimentation, I noticed that unplugging the device allowed me to record once, most of the time. I think that it is due to syncing the audio / video stream. The video source seems to request key frames doesn't get the expected data afterwards. Or something else.
Resetting the USB device without unplugging
That's not nice but can we at least automate it? After some browsing, I found this recipe on Ask Ubuntu:
sudo sh -c "echo 0 > /sys/bus/usb/devices/1-4.6/authorized"
sudo sh -c "echo 1 > /sys/bus/usb/devices/1-4.6/authorized"
I created two shell scripts
- find-elgato4k-sys-dir.sh to find the right sys directory and
- reset.sh to issue these commands and wait for the device to come up again.
Conclusion
After calling reset.sh
, the combined command works reliably for me. And I have
a new found respect for the codecs in my camera.
If you benefitted from this article or, even better, improved upon it, please let me know, e.g. on Twitter.