Peter Kolloch - Blog - ffmpeg: Capture & encode a 4K stream in realtime using VAAPI

I am a wonnabe hobbiest filmmaker. Often, I only noticed problems with the raw footage on my Desktop computer at home - long after I could correct them by a simple retake. Wouldn't it be awesome to record your footage directly to your laptop? Make the footage available to others in your team really quickly, maybe by storing it on a server?

I want to do this on my XPS 13 since that is my most portable option. I should note that my laptop is decent (Intel i7-8550U 1.8 GHz) but doesn't have a discrete graphics card. In fact, I cannot even play back the 4K footage recorded on my Fuji XT-3 with HEVC (h265) on my laptop without tweaking VLC. And even on my hefty AMD Ryzen Threadripper 2950X 16-Core, I cannot encode on the CPU alone in realtime.

Therefore, encoding in realtime should probably be impossible. Or not? I had to try. Oh, and on Linux. NixOS to be specific.

Note that this article is not full of deep insights but mostly contains things I tried and then eventually worked. This kind of content has helped me tremendously in the past, so let's dump it into the Interwebs.

The easy way: OBS

The first thing that I tried was OBS studio. Here is what somewhat worked for me:

I set the output of my camera to 4K, 16:9, 25 fps (full HD does not work)
I added a Video Capture (V4L2) source. I needed to set the video format to YV12 which is apparently an alias for NV12.
I added a Audio Input Capture (Pulse Audio) since the video source does not include the audio.

It somewhat worked. But it randomly stopped working. It dropped frames -- that got better when I disabled the preview. Anyways, if you reading this some time after I published this, give it a try, it might be the simplest option!

Therefore, I decided to use CLI tools to understand what was going on and to ultimately have more control. I dabbled a bit with gstreamer but ultimately chose ffmpeg because I found more and better documentation for it and it seemed to work better.

Digging deeper: Required Software

If you want to follow along, I installed these packages in NixOS by adding them to my package list:

# Capturing and encoding
ffmpeg-full # My version was 4.2.2

# Inspect video devices etc.
v4l-utils
usbutils

# Hardware acceleration for encoding/decoding
vaapiIntel
libva
libva-utils

Underlying video device

ffmpeg and other CLI tools require us to specify the video device name -- at least if you have multiple video inputs e.g. the built-in webcam and the Elgato Camlink 4K. You can either use the /dev/videoX device name that you saw in OBS Studio or find the device names with:

v4l2-ctl --list-devices
Cam Link 4K: Cam Link 4K (usb-0000:39:00.0-1.2):
	/dev/video1
	/dev/video2

You can already use the device names (the first one for the Cam Link) that you found here. The trouble is that these device names might depend on the order in which the video devices are detected. To get a stable device name that works across reboots in you shell scripts, you can look into /dev/v4l/by-id directory to find a good stable device name. For my devicem it was /dev/v4l/by-id/usb-Elgato_Cam_Link_4K_0004550EAA000-video-index0.

We'll refer to whatever device you have chosen with V4L_DEVICE later in the text.

Playing the video feed

To check whether ffmpeg can play the stream, you can use:

ffplay -f v4l2 -i $V4L_DEVICE

That is super simple, isn't it? It even detects the video format correctly. If it doesn't for you, try specifying one with -input_format, e.g. -input_format nv12 before the -i.

Capture the video feed

What does ffmpeg do by default? Let's see by exchanging ffplay with ffmpeg and specifying an output file:

ffmpeg -f v4l2 -i $V4L_DEVICE capture.mkv

Hey, that works! But only barely.

I cannot encode the video in real time and the video is buffered. The raw video feed with 29.97 fps clocks in at about 3Gb/s, a bit less for 25 fps. If we can't keep up, ffmpeg buffers the rest in main memory. So only suitable for quite short videos ;)
Also, the output is ugly, we need to specify something nicer.

Hardware-encoding to the rescue

Since my laptop does not have a discrete graphics card, I am using the "VAAPI" standard which is supported by newer Intel CPUs.

ffmpeg -vaapi_device /dev/dri/renderD128 \
  -f v4l2 -input_format nv12 -i $V4L_DEVICE \
  -vf 'hwupload' \
  -c:v hevc_vaapi -b:v 100M -maxrate:v 120M capture.mkv

Woooh! That's a lot to understand.

Let's start with the basics: A basic ffmpeg command consists of a some global flags (-vaapi_device /dev/dri/renderD128 in this case) list of inputs (-i ...) and a list of outputs (e.g. file names). Local options effect the next input or output, so decoding options are specified before an input, encoding options for the output are specified before the output file name. -f v4l2 -input_format nv12 apply to the input. -c:v hevc_vaapi -b:v 100M -maxrate:v 120M all apply to the output capture.mkv.

Between our input and the output, there is a -vf video filter command which can be used to process the raw video. You could scale the video, transform the color space, things like that. In our case, we need to make the video stream available in the hardware encoder with hwupload. That seems to a bit of a specialty with the VAAPI support. In my short experimentation with the NVIDIA support (nvenc) on another machine, I didn't need something similar.

-c:v hevc_vaapi is used to choose the video codec (-c is for codec, :v is for video) from a long list of supported once. The codecs that use VAAPI have a _vaapi suffix by convention.

Without specifying the wanted and maximum bit rate, I got a stream with low quality. Therefore, I specified -b:v 100M -maxrate:v 120M to aim at quite high quality. The highest quality setting on the X-T3 for 4K 25fps is 400Mbs, so it might be even worth to experiment with higher settings. Note that there is a bug that requires you to specify a maxrate that is not much higher than the wanted bit rate.

Using VAAPI, I could encode 25 fps in realtime. Unfortunately, 29.97 is already too much. That unfortunately also means that my plan to encode a smaller, low-quality stream at the same time, also failed.

The FFMPEG VAAPI Wiki Page gives context and many useful examples if you want to know more.

Audio source

The audio feed is - unfortunately - separate from the video feed.

First, let me save you a day of fiddling. Don't use ALSA directly, use PulseAudio directly if you use it on your system. Using ALSA directly from ffmpeg resulted in clapping a lot to debug audio sync issues and thus driving my girl-friend insane.

Use pactl list sources to output tons of information about your audio sources.

Choose the right device and pick the name. For me, it was alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo.

Capture the audio feed

This is straight-forward, we just have to specify that we want to use PulseAudio (by -f pulse) and specify the device name as input.

ffmpeg -f pulse -i alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo capture.mp3

We skip compression to save some CPU when we do everything together later on:

ffmpeg -f pulse -i alsa_input.usb-Elgato_Systems_Cam_Link-03.iec958-stereo \
    -c:a copy capture.wav

-c:a copy set the audio codec to copy which skips any audio encoding. In this case, it is redundant since the output format doesn't support any special codec anyways. It becomes relevant when we combine it with video.

Putting it together... doesn't work

This mostly combines our video/audio examples and adds a -thread_queue_size 2048 to allow some generous buffering:

fmpeg -vaapi_device /dev/dri/renderD128 \
    -i $V4L_DEVICE \
    -thread_queue_size 2048 -f pulse -i $PULSE_DEVICE \
    -vf 'format=nv12,hwupload' \
    -c:v hevc_vaapi -b:v 100M -maxrate:v 120M \
    -acodec copy \
    output-combined.mkv

This should be the final section, right? I'd hope so but for me this wasn't true at all.

This might work for you. For me, it also worked once or so. Unfortunately, most of the time it will get stuck after two frames. After some random experimentation, I noticed that unplugging the device allowed me to record once, most of the time. I think that it is due to syncing the audio / video stream. The video source seems to request key frames doesn't get the expected data afterwards. Or something else.

Resetting the USB device without unplugging

That's not nice but can we at least automate it? After some browsing, I found this recipe on Ask Ubuntu:

sudo sh -c "echo 0 > /sys/bus/usb/devices/1-4.6/authorized"
sudo sh -c "echo 1 > /sys/bus/usb/devices/1-4.6/authorized"

I created two shell scripts

find-elgato4k-sys-dir.sh to find the right sys directory and
reset.sh to issue these commands and wait for the device to come up again.

Conclusion

After calling reset.sh, the combined command works reliably for me. And I have a new found respect for the codecs in my camera.

If you benefitted from this article or, even better, improved upon it, please let me know, e.g. on Twitter.