Saturday, March 31, 2012

PowerDirector 9 performance problem on AMD Phenom 1055T (update)

This is a follow-up to my post from last year about PowerDirector performance .

I am unhappy to report that Cyberlink has not been helpful solving this problem. They just closed my ticket CS001052755 without any explanation. So, I opened a new ticket. All they would say is that "The performance issue is related with different graphics card drivers and platforms. " without any more specifics. Given that I am running the same graphics card with the same drivers on both of my machines, I would have expected more details. If the problem lies with the nVidia drivers, then I would like to know that and escalate the issue to them. I'm still in the dark as to what causes the software to run faster on a 2007 system than a 2011 system.

To recap, I have two systems that are very similar except for their CPU.
They both run Windows 7 Ultimate x64 SP1.
They both have an XFX 9800GT video card with 512MB of RAM .
They now both have 16GB of RAM.
The main relevant difference is the CPU/motherboard.

One system (designated "Intel") uses an 5-year old Core 2 Quad Q600 at 2.4 GHz. The 16 GB of RAM is DDR2-800. The video card is in a PCI-E 1.0 x16 slot. It is running with a SATA2 SSD and a 4TB RAID 0 array.

The other (designated "AMD") uses a 1-year old AMD Phenom II x6 1055T at 2.8 GHz . The 16 GB of RAM is DDR3-1333.  The video card is in a PCI-E 2.0 x16 slot. It is running with dual SATA3 SSD and a pair of 2TB drives.

Both are running with the same set of nVidia drivers 296.10.

I would expect the AMD system performance to always be faster, due to the faster CPU. All the other components are at least equal or better.  Unfortunately, this is not so.

I created a shorter HD video clip from a JVC GZ-E10 camcorder for the purpose of testing the performance. It is 189MB in size. The consists of just putting this clip on video track 1, and encoding. I am only testing the H.264 encoding performance. All times below are in seconds.
Video : Benchmark2
























Target file format H264










Target resolution 1920x1080










Target bit rate 24 Mbps










Fast video rendering Yes Yes No Yes Yes No Yes Yes No Yes Yes No
Hardware encoding
Yes

Yes

Yes

Yes
SVRT Yes

Yes

Yes

Yes

CUDA Yes Yes Yes No No No Yes Yes Yes No No No
Hardware decoding Yes Yes Yes Yes Yes Yes No No No No No No







































AMD – PD9 64-bit 3305 4 145 145 4 145 145 4 40 123 4 40 124













Intel – PD9 64-bit 3305 8 64 207 8 64 208 8 71 217 8 71 215













AMD – PD9 64-bit 3305 – 1x9800GT Catalyst installed
52

52

40

40













AMD – PD9 64-bit 3305 – 2x9800GT– Catalyst installed
52

52

40

40













AMD – PD9 64-bit 3305 – 1x19800GT in x8 slot – Catalyst installed
54

54

41

41













AMD – PD9 64-bit 3305 – 1x560 Ti– Catalyst installed
31

31

38

38


Here are my observations and takeaways from all this data :
  • enabling or disabling CUDA  appears to have no effect on the rendering time
  • when SVRT is available, the rendering takes only half as long on the AMD system vs the Intel system - 4 seconds vs 8 seconds. This doesn't quite scale linearly with larger files, but the AMD still keeps a large advantage. This is as expected.
  • when the "fast video rendering" option is disabled, again the rendering is much faster on AMD. The AMD system clocks in at 123 to 145 seconds, while Intel takes 207 to 215 . This is as expected.
  • When using the "Fast video rendering" together with "Hardware encoding" options, performance is much worse on the AMD : it takes 145 seconds vs only 64 seconds on the Intel system ! This is NOT expected. This led me to experiment with another one of the program's global preferences :the "Hardware decoding" option. When I turned that preference OFF, miraculously the rendering time on AMD dropped from 145 seconds to 40 seconds. But on Intel, it increased from 64 to 71 seconds.
  • on the AMD system, when the hardware decoding preference is ON, the rendering time is identical whether the hardware encoder option is set to ON or OFF - 145 seconds. This tells me that the hardware encoder is not actually being used at all.
  • My conclusion is that there is something very wrong with the way hardware decoding affects the performance of encodings in PowerDirector 9 on the AMD system. It appears the hardware encoder must be turned off on the AMD.
  • april 7 update : after I reinstalled the ATI Catalyst drivers for the motherboard, the performance changed and became closer to what was expected . I added a 3rd line for this.
  • with Catalyst installed, there is now evidence that the hardware encoder is used, even when hardware decoding is enabled. For example, line 3, column 2 is 52 seconds vs 145s for line 1 column 3 (full software encoder). However, disabling hardware decoding is still faster : line 3 column 8 gives a time of 40s. My guess is the GPU is spread too thin doing both decoding and encoding. With the combination of the AMD 1055T CPU and 9800GT GPU, it's better to decide with the CPU and encode with the GPU, ie. line 3 column 8.
  • I experimented with having both XFX 9800GT cards. The cards are nearly identical except for the revision. One takes a PCI-E power connector and the other doesn't. I put the results in line 4. The results are identical to line 3. In other words, PowerDirector didn't benefit at all from having 2 GPUs in the system. Note that the motherboard is non-SLI. I don't know if it would have made a difference for PowerDirector or not if it supported SLI. So far, no one has ever provided any evidence that PowerDirector can use multiple GPUs.
  • I also experimented with a single 9800GT GPU in the second PCI-E slot, ie. the x8 slot. The results were slower, but not significantly so. Compare line 3 which used the x16 slot with line 4 which used the x8 slot. Times went from 52 to 54s when the GPU was doing both decoding and encoding in hardware.; and from 40 to 41s when the GPU was doing encoding only
  • finally, I purchased an Asus 560 Ti GPU video card tonight. This is the the largest card I can fit in the case. It's within 1mm of touching the drives. The card is spec'ed s 9", but that's only the length of the PCB - it is really closer to 10" when counting the length of the fan. The results are listed on the last and 5th line.
  • With the 560 Ti, finally, the best results are when using the GPU for both encoding and decoding. Line 5 column 2 is 31s which is the best time of the whole table. When disabling hardware decoding, time increases to 38s in column 8. This means the encoding takes approximately 77.5% of the time with the 560 Ti  vs the 9800GT . I am not sure if a 22.5% decrease in rendering time is worth the $240 I paid.
  • I also bought a better CPU cooler, a Hyperx 212 I will try to overclock the 1055T CPU a little bit.

3 comments:

  1. Q6600 easily overclocks to 3GHZ, unless old rev.

    MoRegistrations

    ReplyDelete
    Replies
    1. I no longer have them. I sold both my Q6600 and the motherboards and DDR2 for more than I originally bought them for a couple months ago. People had been looking for 775 upgrade CPUs. I'm running some AMD Phenom II X6 and FX, both OC'ed. Sad to see, the X6 is noticeably faster.

      Delete
  2. How do I Convert vs2000cd vs2 files back ups to vs 2480 vs1 to be loaded and retrieved.. I copy and paste and followed all the direction you gave but still not able to retrieve to files and songs that I'm seeing on my hard drive.. Thanks

    ReplyDelete