Multiple GPUs / HW ...
 
Benachrichtigungen
Alles löschen

Multiple GPUs / HW device selection / FFStream decode debugging / nvidea eGPU / VDPAU

61 Beiträge
6 Benutzer
0 Likes
17.1 K Ansichten
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@Ig0r

Thanks for the feedback.  Well, running prof2 is more at a programmer level and it is not that easy to explain how to proceed.  However, meanwhile another performance improvement has been implemented here which affects redrawing the video/audio on the timeline.  But I do not think that will help with your video.  Testing that I have been doing here is Big Buck Bunny which is 3840x2160 at 60fps and "Play every frame" is almost always near 60.  That is using either the Video driver X11 (software) or X11-OpenGL and no hardware acceleration but I have 16 CPUs on this laptop. 

I think that upgrading to LTS 20 is probably a good idea BUT we have not created another partition for installing that here in order to create a new build on April 30th so it has not been tested.  We will do that this coming week so that it is available but there will be very little specific testing.  When we do that, I will check out how to run "prof2" on it and see what is taking so much time.  Meanwhile, another user has provided some profile information that we can look at.


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@Ig0r

Do not worry about getting back to this in any time frame.  You are busy and I have plenty of other things to do! 

About the prof 2 error, if/when you have time just skip the step that did not work, that is:

cp: target '/usr/local/bin/' is not a directory

and try to run the next steps and let me know if another problem.  We had tested prof2 on Ubuntu 18 a few months ago and at that time we did not have to do the "cp" step so it may not even be needed.


   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phyllissmith

WOW, this high resolution @ 60FPS sounds like quite some heavy work for the hardware. But as I said, using the VAAPI approach I do get superb performance (see the screenshot attached) -> almost steady 60FPS too. But via the eGPU (while we clarified in the meantime that VDPAU is initialized successfully) lacks behind big time in terms of preview playback (~40FPS only and stuttering), while the eGPU rendering is awesome!. The slow preview playback on the eGPU is what we are now trying to debug further here. 

And yeah, I have an old (but gold!) x230 with an Intel i5-3320M (4) @ 3.300GHz, but you know the whole story is about getting the most out of old hardware and let newer hardware (the GTX970 eGPU) do the heavy lifting.

I tried to follow along, but I failed on the next to last step:

ig0r@ig0r-ThinkPad-X230:~/Downloads/cin-ub18-debug/prof2$ ./prof -o /tmp/prof_list.txt /home/ig0r/Downloads/cin-ub18-debug/cin
cant find symbol 'main'

I'm not sure if this has something to do with the "cp" step we just skipped. The two required packages are definitely installed. In time just let me know how to proceed, I'd like to support you on that before I switch to 20.04 LTS (since I think especially around GPU support they implemented quite some new features, among other things the "Launch using dedicated graphics card" gnome shell implementation https://ubuntu.com/blog/whats-new-in-ubuntu-desktop-20-04-lts).

Enjoy your weekend! All the best.


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@Ig0r

I keep losing focus. So getting back to the following:

The slow preview playback on the eGPU is what we are now trying to debug further here. 

At least once I did something to get the FPS down from the expected 60 to 29 but now I can not repeat that -- I was hoping that that would provide a hint as to why the eGPU does not run as well. I have to go back over this forum to refresh my brain.

 

Meanwhile, GG downloaded Ubuntu 20 and created a Cinelerra-GG static build there for later if we get this debugged.  It as at:

https://cinelerra-gg.org/download/testing/cinelerra-5.1-ub20-x86_64.static.txz


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@Ig0r

Today we ran some more tests on our GTX970 computer using Big Buck Bunny 3840x2160 which I had converted from mkv to mp4.  I uploaded that exact test file to your video_debug shared drive (which should be deleted after you download it as I do not know about license stuff).

With "vdpau" it consistently plays at 60fps and 88% cpu usage.

With "vaapi" it consistently plays at 60fps and 334% cpu usage (it is probably emulating).

With "none" it consistently plays at 58fps and 380% cpu usage.

Conclusion that we have come to here might be as you stated earlier:

"It could be that we now reached the inherent bottleneck of the setup is the by far less optimum communication between CPU and eGPU vs the inherent optimization between the intel cpu and the onboard "gpu""

But GG can not explain why rendering - which you said runs at full GPU - should be any faster than playing using your eGPU since there is the same sort of traffic going to and from across the expresscard bus/slot.

Unfortunately, we have no further debugging help that we can think of doing from here without the same setup.  The one suggestion he said that you might watch for is to run the command line "top", or equivalent, with 5 second frequent updates to see if the CPU usage goes real low and then suddenly jumps real high, back and forth.  That might indicate it is doing something in the software that is time-consuming.  I do not know if running the profile2, prof2, would lead to any discovery.


   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phyllissmith

It's been a while - sorry for that. In the meantime I upgraded to ubuntu 20.04LTS, installed all nvidia drivers and did some testing. Some pre-info on that up front:

  • The command of utilizing the eGPU in mpv changed a bit. It now actually complains about the GoPro footage pixel format:

mpv --vo=vdpau -hwdec=vdpau video.mp4

gives

[vo/vdpau] Warning: this compatibility VO is low quality and may have issues with OSD, scaling, screenshots and more.
[vo/vdpau] vo=gpu is the preferred choice in any case and includes VDPAU support via hwdec=vdpau or vdpau-copy.
[ffmpeg] AVHWFramesContext: Unsupported sw format: yuvj420p
Failed to allocate hw frames.

A workaround in the command solves the issue having almost zero CPU usage and giving the eGPU the full load:

mpv --hwdec-image-format=yuv420p --vo=vdpau -hwdec=vdpau video.mp4

This at least shows that handling the yuvj420p is not a piece of cake, but somehow it is possible to fully utilize the GPU for optimum performance.

  • Another proof that the GPU can handle this format very well is a transcoding test using ffmpeg which is blazing fast (~3x normal playback speed) and almost has 0% CPU usage:

ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -vsync 0 -i video.mp4 -c:v h264_nvenc -b:v 60M -maxrate:v 61M -bufsize:v 80M -profile:v main -rc:v vbr_hq -rc-lookahead:v 32 -spatial_aq:v 1 -aq-strength:v 15 -coder:v cabac -f mp4 h264_test_dec.MP4

This also gives the video in yuv420p, which might be used just in case we find out that it's really only the yuvj420p pixel format giving us a hard time (which I don't think, read below). 

  • Crosschecking with VLC also works as expected, VDPAU is being used as hardware decoder.

In terms of testing I think we're good to go - the eGPU is fully operational having full hardware decoding support.


Switching back to cinelerra (I used the static build from 30.04.2020, thanks for that!) things start to be messy again. 

  • First, I checked if using the intel vaapi stuff everything works. And yes, indeed the performance is as good as in the past (I'm happy to point this out again: this is still the only video editing software I found with this great hardware playback support!). 

Trying vaapi with your big bucks bunny test video gives fairly good 48fps at moderate CPU usage. There is an issue about the ibva: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so popping up which I never saw before. As a matter of fact, if I remember correctly, I did the big bucks bunny test at the old ubuntu already at that gave me permanent smooth 60fps, so there is still something fishy with the current vaapi drivers, but already very usable:

ig0r@ig0r-ThinkPad-X230:~/Program_data/cinelerra_static$ ./cin
Cinelerra Infinity - built: Apr 30 2020 09:07:59
git://git.cinelerra-gg.org/goodguy/cinelerra.git
(c) 2006-2019 Heroine Virtual Ltd. by Adam Williams
2007-2020 mods for Cinelerra-GG by W.P.Morrow aka goodguy
Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.

[h264 @ 0x7f3cec2d9c00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec2d8d40] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3cec277700] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f3cec3fb780] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec3fa580] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3c817cd840] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f3c84011380] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec275400] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3c8403b840] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f3c8403d800] Trying to use DRM render node for device 0.
[AVHWDeviceContext @ 0x7f3c8403d800] libva: VA-API version 1.7.0
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Found init function __vaDriverInit_1_7
[AVHWDeviceContext @ 0x7f3c8403d800] libva: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so init failed
[AVHWDeviceContext @ 0x7f3c8403d800] libva: va_openDriver() returns 1
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Found init function __vaDriverInit_1_6
[AVHWDeviceContext @ 0x7f3c8403d800] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x7f3c8403d800] Initialised VAAPI connection: version 1.7
[AVHWDeviceContext @ 0x7f3c8403d800] VAAPI driver: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.0.
[AVHWDeviceContext @ 0x7f3c8403d800] Driver not found in known nonstandard list, using standard behaviour.
[h264 @ 0x7f3c859e1600] Reinit context to 3840x2160, pix_fmt: vaapi_vld

 

  • I did another test using my old short testvideo, and damn it the performance is great! Permanent 60fps at both viewer and compositor.
  • So lets check the VDPAU stuff, and this is where sh** hits the fan again. Trying the bbb video of yours gives me lousy 20fps after "successfully created a VDPAU device...". On top of that, the CPU usage is quite high (~70%) AND the GPU load is high (~95%, having ~60% video engine utilizatzion)

ig0r@ig0r-ThinkPad-X230:~/Program_data/cinelerra_static$ ./cin
Cinelerra Infinity - built: Apr 30 2020 09:07:59
git://git.cinelerra-gg.org/goodguy/cinelerra.git
(c) 2006-2019 Heroine Virtual Ltd. by Adam Williams
2007-2020 mods for Cinelerra-GG by W.P.Morrow aka goodguy
Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.

[h264 @ 0x7f68f0170540] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f0177700] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68f016e240] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68f02d1ac0] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f02d0bc0] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68dd7cd300] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68dd7e5b00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f68dd7d0900] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d4010e00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f02e1000] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68d4039e80] Reinit context to 3840x2160, pix_fmt: yuvj420p
[swscaler @ 0x7f68deaf9000] deprecated pixel format used, make sure you did set range correctly
[AVHWDeviceContext @ 0x7f68d4016700] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d403ca40] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d403ca40] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68deb221c0] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68d4158780] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f69389901c0] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f69389cdc80] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f6938075980] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[AVIOContext @ 0x7f68d4228bc0] Statistics: 9614994 bytes read, 3 seeks
[AVIOContext @ 0x7f68d4017fc0] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68b800ec40] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68b802d640] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f68b85293c0] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68b86d5e80] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68b9cea540] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ddcdf500] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68ba6a6840] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68dd857a00] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68ba6a7980] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69f340] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69f340] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69

  • A quick test on my old short testvideo gives similar results. 
  • I did also some tests with videos having yuv420p pixel format with similar performance. So it is not the pixel format to blame anymore, the devices are initialized correctly.

Conclusion:

By now I use cinelerra via the vaapi drivers only, since the eGPU doesn't give me any advantages. So yes you were right, I was too euphoric when stating the rendering is faster on the eGPU. It only seemed so looking at the GPU usage. This is kind of fake, since now the GPU is also very heavily used, without any performance advantages. 

Question:

Are you guys still motivated to debug this further? Its hard, as you said, since everything seems to work fine at your setup having even the same GPU. Still, its such a shame that the hardware is capable of everything, but somehow there is a missing link to fully utilize this in cinelerra. I stick to it: The software would see another huge leap forward compared to other software out there. I hope to have some more time on my hands soon so I can help however I can on this. 

Please let me know how and if we proceed - thanks so much for your time! All the best,
Christopher


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@ig0r

We would definitely like to have this working to full gpu usage, but right now only have 1 idea that we might be able to try.  That would be for us to install Ubuntu 20 on the same computer that had Fedora which we were using for testing, to see if there is some difference there that may affect how Cinelerra runs with a GPU (still no eGPU, but it might tell us something).

 

I will have to read your last note in more detail to see if I can come up with any further debugging techniques that might lead to a solution.  Right now we have no ideas that you could try on your side.  It is quite puzzling.

Diese r Beitrag wurde geändert Vor 5 Jahren von PhyllisSmith

   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@ig0r

We installed Ubuntu 20.04 and nvidia 390 drivers and with default settings using vdpau, Big Buck Bunny got 60fps when playing.   BUT, believe it or not, when you switch to Settings->Preferences, Playback A, Video driver of X11-OpenGL, we only got 13 fps.

 

So please check that you have "X11" set for the Video driver and test BBB again.  I.e. we generated a problem that matches yours, with the exception of an eGPU, and totally solved it to get 60fps using X11.  Let us know the results -- I hope this works for you too.  (It must be thrashing and that would explain why MPV, etc. do not exhibit the same slowdown -- thy most likely only use software and not OpenGL in this situation).

Diese r Beitrag wurde geändert Vor 5 Jahren von PhyllisSmith

   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phyllissmith

Unfortunately that's not it. I tried this setting several times, and while indeed the performance in OpenGL is even worse (~6fps), the defaul X11 option gives me the ~24fps using the eGPU - check the screenshot attached. You'll also see some performance stats from the nvidia settings and cpu usage from nmon.

OK it seems we're stuck now, but great that you still have motivation! Mine flared up again too! 🙂 

Let me know if you have further ideas - I'll continue with some testing on my own here.


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@ig0r

Actually you gave GG an idea so thanks for testing X11-OpenGL along with X11 as the video driver.  There must be something that MPV is doing that sets up the hardware usage to alleviate potential GPU usage thrashing.  There are probably some graphics experts out there that might know the answer but we are mostly deficient in this area.  GG might be looking at the MPV code to see if there is some setup that can solve the issue -- but that is a lot of code so narrowing it down may be difficult.  It does point to the eGPU though since we do not have the problem with our GTX970 when using the X11 driver.

 

Meanwhile, I will search the web to see if I can find a clue too.  So we are not done yet!


   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phyllissmith

Long time no read - hope you're doing fine? 

A quick update from my side: I made a small upgrade on my side and got myself a used GTX1050 since I had to learn the hard fact that the GTX970 does not support h.265, period, whereas the GTX960 does 😀 

In any case, I have now two cards available for testing (wow, it was literally plug&play, same drivers, awesome!), but would like to sell the GTX970 ASAP. I would wait for your OK in case you would like to run some more tests on this particular card.

While I'm again successful in playing my (now finally 4k h.265 GoPro footage) in mpv player using

mpv --hwdec-image-format=yuv420p --vo=vdpau -hwdec=vdpau 4k_test_h265.MP4

and also transcoding from h.265 to h.264 with pure eGPU power (so my lenovo x230 can also play them smoothly when the eGPU is disconnected) via

ffmpeg -y -hwaccel cuvid -c:v hevc_cuvid -vsync 0 -i h265_test.MP4 -c:v h264_nvenc -b:v 60M -maxrate:v 61M -bufsize:v 80M -profile:v main -rc:v vbr_hq -rc-lookahead:v 32 -spatial_aq:v 1 -aq-strength:v 15 -coder:v cabac -f mp4 h264_test_dec.MP4

I run again in the old issues in cinelarra of playing them in a very stuttered way. 

The weather turns grey here, so I'll have some time for testing this week I guess. Looking forward hearing from you soon!
Cheers


   
AntwortZitat
(@phyllissmith)
Reputable Member Admin
Beigetreten: Vor 6 Jahren
Beiträge: 242
 

@ig0r

Awesome new card! We have gotten side-tracked on other more pressing issues but I would say that you do not need to keep the 970 card as any testing (via eGPU) will get the same results with the 1050.  At least I think that that is what you are saying.


   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phyllissmith

It's me again, Hi! 

Since the nvidia drivers gave my ubuntu 20.04 LTS a hard time lately I ended up re-installing everything. Since now everything works flawless again, I gave the newest cinelerra single user ubuntu 20 static build a shot to see how things are going.

Step by step I selected a clean test-file to narrow the issue down. In my shared google drive from the past you'll find a "h264_test_yuv420p.mp4" file. It excludes any potential yuvissues, while still being my GoPro Hero 7 Black h.264 2.7k 60FPS test piece from the early days.

It quickly came down to the old story: MPV (when started properly) shows almost no CPU and moderate GPU load while playing all 60FPS as a piece of cake - perfect!

Setting up a fresh cinelerra with VDPAU and verbose config, the performance was again lousy while having no complaints on the console what so ever, except that the nvidia device using vdpau with the most recent driver is initialized properly (screenshot attached). Compared to the mpv nvidia stats, cinelerra used ~20% more GPU (97%) while the "Video Engine" utilization was similar at ~35%. The PCIe Bandwidth utilization was also higher by ~10%, but as seen still far from 100%. (lower right corner in screenshot)

I was hoping to come back to this topic again, as I learnt to really appreciate this great video editor (I did some projects based on the superb vaapi performance now, great job!) but want to bring speed and performance to the full potential, having the hardware sitting right next to me.

Let me know how I can follow up with further debugging - as you said in the past, many things to be learnt from mpv player?

All the best from Austria


   
AntwortZitat
(@phylsmith2004)
Reputable Member
Beigetreten: Vor 6 Jahren
Beiträge: 364
 

@Ig0r

We have spent the morning on this to once again review what is known.  GG still had the MPV code on his computer from last time and so he went through the code with me looking over his shoulder.   I no longer think that the problem is due to the External GPU but rather the fact that Cinelerra uses ffmpeg to run the vdpau opengl code.  In looking at MPV, it goes directly to the hardware without using ffmpeg so we now think that it is this extra layer that makes Cinelerra slower than MPV.  Also, in looking at how ffmpeg versus MPV calls OpenGL, it appears that MPV has specific calls for different OpenGL stategies (for example EGL, XGL, etc). whereas ffmpeg uses just the one OpenGL strategy making it more generic and possibly slowing things down.  But that is not to disparage ffmpeg and we may be offbase here.  BTW in looking at the MPV code today, GG says that the OpenGL code was very expertly done!

 

One other thing worth mentioning is that the mechanism of decoding (i.e. playing) in Cinelerra is so different than MPV.  MPV does not have to decompress the input data and can pass it straight to the hardware.  But Cinelerra MUST decompress the data; after all the whole point in using an NLE is to be able to edit it so of course it has to be decompressed before sending to the GPU. 

 

And as far as our computer using an Internal GPU running at a full 60FPS with vdpau, that may have more to do with the computer hardware desktop as opposed to a Thinkpad laptop (?) and even the different operating system (we use Fedora, not ubuntu).

 

In summary, I think only a "real" OpenGL expert could definitively take this any further or come up with a truly valid explanation.

 


   
AntwortZitat
 Ig0r
(@ig0r)
Eminent Member
Beigetreten: Vor 5 Jahren
Beiträge: 27
Themenstarter  

@phylsmith2004

Thanks so much for your elaborations on this. Of course, all your arguments make sense, especially the fact that any NLE must work a lot harder than a simple video player.

Still it would be interesting to discuss

  • why, compared to vdpau, vaapi is doing fine in this regard (as you know, vaapi via the internal intel graphics workst just amazing within cinelerry, utilizing the notebook internal gpu hardware acceleration features). Is it that vaapi does not use the ffmpeg lib? Otherwise I would expect the same bottleneck there.
  • why the perfectly fine working transcoding using ffmpeg and cuvid is still able to perform blazingly fast compared to the internal GPU
    • VDPAU: using ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -vsync 0 -resize 1920x1080 -i input_HQ.mp4 -c:v h264_nvenc -rc:v vbr_hq -cq:v 19 -b:v 2500k -maxrate:v 6600k -profile:v high -f mp4 input_HQ.mp4 for example transforms a high quality (i.e. freshly rendered) video into a lower quality one (i.e. to have it on the smartphone where this quality is more than sufficient)
    • VAAPI: The kind of equivalent thing is done using ffmpeg -i input_HQ.mp4 -vcodec libx264 -crf 25 -vf scale=1920:1080 input_LQ.mp4 to utilize the internal GPU (i.e. when I'm on the go without the eGPU around).
    • Both do just fine and generate approximately the same file size and bitrate in my example, but VDPAU at ~5x speed compared to VAAPI ~0.3x speed (compared to 1x = realtime playback speed). Long story short: There are ways to trigger ffmpeg properly to utilize the eGPU fully.

But I already see that we're running out of options here, as you said maybe "a real expert" could help. 

On my side I could try a simple "ffmpeg OpenGL Video player" on ubuntu maybe? To see if its really the ffmpeg openGL interface, or do you have another idea how to nail it further down?

In any case, all the best and stay healthy!


   
AntwortZitat
Seite 4 / 5
Teilen: