Multiple GPUs / HW ...
 
Avisos
Vaciar todo

Multiple GPUs / HW device selection / FFStream decode debugging / nvidea eGPU / VDPAU

61 Respuestas
6 Usuarios
0 Me gustas
17.1 K Visitas
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

And just one last thing.  You inclused the following information on mpv play output as follows:

Using hardware decoding (vdpau).
VO: [opengl] 2704x1520 vdpau[yuv420p]
AO: [pulse] 48000Hz stereo 2ch float

Note that it reports that it is hardware decoding using vdpau, but the VO reports "opengl" and "yuv420p" not "yuvj420p" so it is possible that the use of the graphics board is really via OpenGL.

I have tried to find any reference on the internet that supports my conclusion that the GTX970 simply does not support yuvj420p but could not.  GG agrees that that is as far as it goes and believe we have found the only source of the problem and the problem is solved (the patch was not even necessary).


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

About MPV/mplayer using your GTX970 board and vdpau.  GG had a chance to look at the source code and although he did not get into great detail, it looks like the code does not go through FFmpeg but rather uses OpenGL/shader via an API.  Therefore the eGPU does not get a direct chance to reject the yuvj420p format.   This was probably a major task -- the MPV github is very active with more than just 1 developer (and ongoing changes are being made almost daily with wm4@nowhere a funny guy who comments are so funny).


   
ResponderCitar
 Ig0r
(@ig0r)
Eminent Member
Registrado: hace 5 años
Respuestas: 27
Topic starter  

@sam & @phyllissmith, I admire your philosophy! Nevertheless, keep me posted on any news on how to sponsor a beer/pizza. I guess the best way to pay back the community is to come up with an article on reddit and describe my fight + (hopefully) solution on such an eGPU setup for video editing on linux (since in the end I spend quite some time on this now, people may benefit from some learning).

@phyllissmith, thanks for providing so much info on your testing! I'll go through point for point now.


#1: Yes, I can confirm the same output, showing me the yuvj420p stream info including the "non-standard" frame rate. A quick note: I guess many action cameras nowadays use such a frame rate, no clue when such high frame rate become "standard".


#2: While I do can execute va_openDriver(), so it's kind of waiting for a command, I can not get either the "Successfully created a VDPAU device" OR "Unsupported sw format: yuvj420p" anywhere yet. I guess the va_openDriver() works since there is the intel "GPU" running in parallel. 


#3: This is related to #2, so no I do not see any successful VDPAU device creation after changing to "verbose" logging. The error messages rather stay exactly the same:

root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# echo $DISPLAY
:0.0
root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# export CIN_HW_DEV=vdpau
root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# echo $CIN_HW_DEV
vdpau
root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# export CIN_HW_DEVICE=:0.0
root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# echo $CIN_HW_DEVICE
:0.0
root@ig0r-ThinkPad-X230:/home/ig0r/cinelerra_singleuser_1/cinelerra5/cinelerra-5.1/bin# ./cin
Cinelerra Infinity - built: Apr 13 2020 11:28:12
git://git.cinelerra-gg.org/goodguy/cinelerra.git
(c) 2006-2019 Heroine Virtual Ltd. by Adam Williams
2007-2020 mods for Cinelerra-GG by W.P.Morrow aka goodguy
Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.

FFStream::decode: avcodec_send_packet failed.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: failed
HW device init failed, using SW decode.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: avcodec_send_packet failed.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: failed
HW device init failed, using SW decode.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: avcodec_send_packet failed.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: failed
HW device init failed, using SW decode.
file:/home/ig0r/testvideo.MP4
err: Invalid data found when processing input
FFStream::decode: Retry limit


#4 & subsequent post: Yes, my vainfo does not include the VAProfileJPEGBaseline : VAEntrypointVLD, but interestingly I'm also able to open and process the testvideo.MP4 when I disconnect my eGPU, and setting everything to vaapi (so it at least uses hardware acceleration of the onboard intel "gpu"). There is a screenshot attached, you also see that the CPU loading is rather low (compared to SW decode), which is good!


Related to your last two posts: Oh boy, what a bummer. I was about to mention the superb performance of mpv using this test video. Your explanation makes a lot of sense, thank you very much for digging into all this!

mpv player was keeping my hopes up high, but it really does look like that the GTX970 does not support this yuvj420p out of the box. Could do do me a favor and provide me a test video which uses a format which is supported by this GPU, please? (Best via the already shared google drive). 

If this does work, I would need to change the standard format of GoPro (haha, yeah right..) or use ffmpeg to do some kind of conversion (that's really the worst case scenario for me, since it requires a lot of time and disk space and especially proper settings of the ffmpeg conversion).

Additionally, if this test is successful we can go ahead and close this thread.


Thanks so much for all your work you have done here! 
Cheers!


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

I uploaded tutorial.mp4 to the shared drive and it is also at:   

https://streamable.com/0c9mrv

Just about any .mp4 file will work. There is a Settings choice of "Transcode" in Cinelerra to convert a file to something besides your yuvj420p format.  This is for convenience sake as doing this from the command line just using ffmpeg is really faster.

There is no problem using the non-standard frame rate; I just noted it out of curiosity and was amazed that they did not at least just use 60 instead of 59.

The "verbose" loglevel was put in the wrong file because that I was I suggested.  That is the source file but not the actual running file in /bin subdirectory.  Correct spot is in {cinelerra_path}/bin/ffmpeg/ decode.opts (sorry about that).  It is worth doing as you load tutorial.mp4.  You should see:

 

[AVHWDeviceContext @ 0x7fff1c0ba540] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 440.64 Fri Feb 21 00:41:34 UTC 2020) on X11 display :0
[h264 @ 0x7fff1c5b2a00] Reinit context to 1440x1088, pix_fmt: vdpau
[h264 @ 0x7fff74038cc0] Reinit context to 1440x1088, pix_fmt: vdpau  ...

(and no switching to SW decode)

I


   
ResponderCitar
 Ig0r
(@ig0r)
Eminent Member
Registrado: hace 5 años
Respuestas: 27
Topic starter  

@phyllissmith, wow THANKS SO MUCH for solving this puzzle!! 

Indeed, when using your test video (which is .MP4 but yuv420p instead of yuvj420p) the verbose output says

[AVHWDeviceContext @ 0x7f8f5c050640] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :0.0
[h264 @ 0x7f8f5c588480] Reinit context to 1440x1088, pix_fmt: vdpau
[h264 @ 0x7f8f7c6e8f40] Reinit context to 1440x1088, pix_fmt: yuv420p

Cinelerra is fully supporting the VDPAU HW device, so the CPU utilization is fairly low at ~30%, compared to 100% when software decoding.

However, when shutting down the eGPU and switching to vaapi with the intel onboard "gpu" cpu utilization is even lower!! At ~15%!! At least that's the case for the provided testvideo (yuv420p), and unfortunately this is not true for the GoPro footage (which is yuvj420p of even h.265). I suppose this is due to the extra GPU traffic on the express card slot device?

I need to point out here how amazingly good cinelerra performs compared to Olive, KDENLIVE and OpenShot (lets don't talk about DaVinci resolve, since they don't even "support" mp4 files in linux (but in windows!?)). All of them utilize the CPU at least 50-60% while just previewing the freshly loaded yuv420p testpiece on my machine, compared to 15% when previewing in cinelerra - amazing guys, chapeau!

Anyhow, what a disappointment, that apparently the GoPro Hero 7 Black (when used in "compatibility mode" = h.264 + HEVC) saves the videos in yuvj420p, which (of course) can not be changed and apparently is not supported by my nvidia GTX970 to actually do HW decoding (although I could not really find any details on that either, yet..). 

I was aware of the fact that the GTX970 will not support h.265 (check the picture attached from the nvidia page)(btw, the GTX960 does seem to support h.265), but I was not prepared to run into these kind of issues once using h.264 in the first place. One letter in the used color space made all the difference, and my GPU practically useless, unless I do some extra conversion via ffmpeg (did I get it right?). In case of people having some additional input here I'd be glad.

Well, it was indeed a big learning for me (and hopefully others) and my solution might as well be to get a newer Nvidia card as my eGPU (it seems I'll go for something like the 1050, since it is fully supporting h.265 which I'll set in the GoPro as well). In the end I just wanted to edit my GoPro footage - didn't expect it to be that difficult.

In any case, once I'll have the new setup running I'll stick to cinelerra, since the tests above already showed by far superior GPU utilization compared to other free video editors out there. 


I'd like to thank you, @phyllissmith, again for all your patience and amazingly fast and competent support. 

All the best! 


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

Well, I know you spent hours and days, probably over weeks,  to get this mystery resolved and I applaud your tenacity and willingness to not give up early.  It was a tremendous amount of work on your part.

Yes the following appears to be correct -- "One letter in the used color space made all the difference, and my GPU practically useless, unless I do some extra conversion via ffmpeg (did I get it right?)."

It is rather strange that the GoPro uses Jpeg (assuming that is the j in yuvj420p) which is usually for single photos rather than Mpeg used for motion images.  But jpeg saves space at the lost of some quality, I guess.

We all learned here and it will be a help to others for sure in the future.  It was an interesting problem.


   
ResponderCitar
 Ig0r
(@ig0r)
Eminent Member
Registrado: hace 5 años
Respuestas: 27
Topic starter  

@phyllissmith

Thanks for the positive spirit - can use it quite a bit right now.

May I ask one final thing here? 


I recorded some test footage with the GoPro in h.265 (HEVC) and noted that it also uses the yuvj420p color space:

Stream #0:0(eng): Video: hevc (Main) (hvc1 / 0x31637668), yuvj420p(pc, bt709), 1920x1440 [SAR 1:1 DAR 4:3], 59969 kb/s, 59.94 fps, 59.94 tbr, 60k tbn, 59.94 tbc (default)

While it is clear that the GTX970 does not support this, a GTX1050 should. BUT, as we saw in this post, while the GTX970 does support officially h.264, it just practically fails at the yuvj420 color space (even though it is h.264, which is still puzzling me).

My fear is that, when I change the GPU to e.g. GTX1050, it might as well just fail at this "weird" color space, which is again yuvj420p, while it should officially support HEVC (aka h.265).

Does any of you have a nvidia gpu from the 10xx generation (or higher) in order to test the hardware acceleration? I uploaded a testvideo "h265_test.mp4" so maybe some of you is able to test this within cinelerra. Looking at some other forums in the net it seems that the yuvj color space gave quite some troubles in the community already. Requesting anything from nvidia is my very last resort.

BTW: I changed from NTCS to PAL in the GoPro settings too, without any effect on the used color code.


BTW: I also converted the existing yuvj420p testvideo to a different color space using ffmpeg (also scaling a bit, since this is basically copy paste from a different forum):

ffmpeg -i "h264_test.MP4" -vf "scale=w=-2:h=1920:sws_flags=spline+accurate_rnd:in_range=tv:out_range=tv" -c:v libx264 -colorspace bt709 -color_trc bt709 -color_primaries bt709 -color_range tv -pix_fmt yuv420p -an -f mp4 "h264_test_yuv420p.mp4"

Result:

Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 3416x1920 [SAR 4095:4096 DAR 116571:65536], 56594 kb/s, SAR 8112:8113 DAR 169:95, 59.94 fps, 59.94 tbr, 60k tbn, 119.88 tbc (default)

While the quality and playback suffer quite a bit (for sure not optimum settings), it does show that indeed the color space is now changed to yuv420p, and as expected cinelerra successfully initialized the VDPAU hw device when this converted file is loaded (as expected). I saw somewhere that the bt709 flag might have caused some issues, but this is clearly not the case here.


Another important clarification: The screenshot attached to my post from 14.04.2020 8:57 is done using VAAPI, BUT with the eGPU still connected!! The verbose output (debugged later on) actually points out that there is something weird going on:

[h264 @ 0x7f4be0751bc0] Failed to query surface attributes: 18 (invalid parameter).
[h264 @ 0x7f4be0751bc0] Failed setup for format vaapi_vld: hwaccel initialisation returned error.
[h264 @ 0x7f4be0751bc0] Invalid return from get_format(): vaapi_vld not in possible list.
[h264 @ 0x7f4be0751bc0] decode_slice_header error
[h264 @ 0x7f4be0751bc0] no frame!

Therefore the visible CPU utilization from this screenshot is way higher than described later that day (at 10pm). So if I want to make full use of the (stunning!) vaapi performance, I'm not even allowed to have the eGPU connected at all 🙁 (of course the hw device is set to vaapi, but maybe the environment variables are messed up since I point to the display 0 which is actually the vdpau stuff..whatever..).


Practically speaking, in terms of cinelerra to be able to support this color space too, it seems it would need a similar workaround as discovered in the mpv-player source code. This would just save anybody working with GoPro footage quite some trouble and boost performance (since yuvj seems to be in place as early as the Hero 3, and it will not change in the near future I guess). Especially if in the h.265 test it turns out that the 10xx cards are of no use either, even though they should.

Of course this is related to a lot of work and it seems that nobody really ran into this issue before, so priority is low I guess.

Thanks again for all the support, and in case anybody can test the h265_test.mp4 with a newer nvidia card (using pascal architecture, check the attached screenshot) I would appreciate it a lot, since it would spare me quite some money, time and nerves. 

All the best! Cheers!


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

The probability of the GTX1050 working with yuvj420p ranges from extremely low to zero - actually most likely zero.  This is based on gg also looking at Nvidia docs on the internet.  I think your fear is well founded as in: "My fear is that, when I change the GPU to e.g. GTX1050, it might as well just fail at this "weird" color space, which is again yuvj420p, while it should officially support HEVC (aka h.265)."  Also, I did see a message "yuvj420p is deprecated" which suggests it is not well received.

I do not think think there are that many Cinelerra users that have the GTX1050 board. (Oh great, now gg has an excuse to upgrade to the GTX10xx series but that is way off in the future.)  If anyone does have this board and even if not running Cinelerra, it is possible that running "vdpauinfo" may provide a clue if the output shows something different than:

Video surface:

name width height types
-------------------------------------------
420 16384 16384 NV12 YV12
422 16384 16384 UYVY YUYV
444 16384 16384 Y8U8V8A8 V8U8Y8A8


   
ResponderCitar
 Ig0r
(@ig0r)
Eminent Member
Registrado: hace 5 años
Respuestas: 27
Topic starter  

@phyllissmith

I agree, and I won't try to upgrade the graphics card in order to solve this (yet).

In order to sum our findings up, can we conclude that it is upon nvidia's lack in terms of driver support for this specific pixel format (yuvj420p color space)? Or is it rather related to the ffmpeg interface? 

This is important for anybody (including me) to follow up on this. If we agree on the most probable source, I can contact e.g. nvidia (for example to ask why h.264 is officially supported but get it's a** kicked by a weird color space). Also cinelerra would benefit from a solution which is solved in its origin, instead of building a workaround like it is done in mpv-player.

Thanks! BR


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

The nvidia board GTX970 apparently works for yuvj420p because your data shows that it works with the mpv-player.   So contacting nvidia would not help --  the problem is probably in the software before it gets to the board.


   
ResponderCitar
 Ig0r
(@ig0r)
Eminent Member
Registrado: hace 5 años
Respuestas: 27
Topic starter  

@phyllissmith

wow, lol ok I can't post my answer here as a regular "reply", since wordpress thinks its potentially dangerous 😀 😀 Whats happening...

Please check the attached textfile with the text I would have posted here instead.

Unfortunately, the highlighting has been lost. Especially in the last code section please look for 

FFMPEG::scan: codec open failed

Sorry for the inconvenience. Cheers!


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

A "fix" in ffmpeg worked here on our GTX970.  Vdpauinfo on our computer and yours supports H264_CONSTRAINED_BASELINE so that patch should work for you too.  GG checked it into GIT so you can check it out and recompile Cinelerra like you did the other day.  Do not use the patch provided the other day or the environment variable.  Just use vdpau as the hw_device in Settings.

 

The manual tells you how to update your GIT repository to get the modifications but if you have problems, just ask.

P.S. this patch came from mpv so they deserve the credit even though it was luck on my part that I found it.

P.P.S The cpu usage on our Fedora GTX970 Cinelerra run using testvideo.MP4 went from 250-370% down to 75%


   
ResponderCitar
(@cinadmin)
Eminent Member Admin
Registrado: hace 6 años
Respuestas: 32
 
Posted by: @ig0r

wow, lol ok I can't post my answer here as a regular "reply", since wordpress thinks its potentially dangerous 😀 😀 Whats happening...

Unfortunately this forum and the spam protection works sometimes very idiosyncratic. If new users often write a lot of posts in a short time, the spam protection assumes that there might be spam and restricts certain functions for a short period of time. Unfortunately I have no influence on this. Don't be irritated as long as you can continue writing. Over time, all functions will be unlocked and the longer you are there, the less the spam protection intervenes.


   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

About your attached txt file error message as shown below, I believe this is simply a message related to building the Index file for the media.  The first time you load it an Index file is created.  If you load it a second time, it does not have to rebuild the Index so you will not see this error message (but I should add this to the manual and verify in the code with gg).  You get the same error on your original file as well as your converted file, the very first time you build the Index (or do a rebuild index).

   FFMPEG::scan: codec open failed

Also, yesterday we discovered that the "Retry limit" came from loading the audio but it did not seem to create a detectable problem and our "guess" was, that because it only seems to occur in one spot it might have been from turning the camera off and then back on.  The retry limit is set to 1000 and it went (in only 1 place) to 1088.  This probabl explains the reason also for the error:

   audio0 pad 4 0 (4)

   
ResponderCitar
(@phyllissmith)
Reputable Member Admin
Registrado: hace 6 años
Respuestas: 242
 

@Ig0r

Neither video plays "choppy" for me. Forgot to mention this --  if you set "Play every frame" in Settings, Preferences, Playback A,  and watch while playing for me I get consistently 45 FPS so plays smoothly.   And I have plenty of unused CPU power.  It took 14 seconds to play this 11 second video in Cinelerra but only 12 seconds to play from the command line using ffmpeg.

But one problem at a time!


   
ResponderCitar
Página 2 / 5
Compartir: