Page 1 of 1

Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Posted: Tue Jun 16, 2020 7:46 pm
by jwatte
The question is in the last paragraph, but may need some background, so here we go!

I'm running Virtual Audio Cable with 3x256 buffers, and it's doing well. (I haven't tried 3x128 -- I don't need to go to that extreme for my desktop audio.)
In Virtual Inputs in Potato, however, if I have a buffer size less than 2048, the virtual audio inputs glitch out.
My computer (running Windows 10) is fairly high spec, and generally I can run audio programs with small buffers.

My assumption is that the VAC installs a driver, that uses the sound miniport driver, to deliver data written from the "input" (or empty on a timer) to the output.
My further assumption is that the "virtual inputs" in Potato instead are application-level DirectShow endpoints.

My current best theory for this behavior is that the DirectShow graph simply can't deal with smaller buffers in a timely manner, whereas VAC, using the kernel-level infrastructure, can.

If that theory is correct, then what is the fundamental limitation here? Is it scheduling of the sound processing in Potato? Is it scheduling in the applications that would play "into" these virtual channels? Is it in the DirectShow graph infrastructure itself? Is it something else?

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Posted: Wed Jun 17, 2020 5:32 pm
by Vincent Burel
with our virtual audio cable the latency limitation is given by the both applications connected on the input and the output of the cable.

For example if the player application connected to input is using 256 samples buffer and if the application connected on output is capturing 512 sample buffers. The internal needed latency will be around (2x 512 + 256) samples. let's simplify to 3x 512 to be sure of the stream continuity. (if sampling rate is the same everywhere). this is explained in detail in this documentation: https://www.vb-audio.com/Cable/VBCABLE_ ... ttings.pdf

When our virtual audio cables are used as virtual I/O by Voicemeeter the required max latency is pending on 3 x applications: the 2x possibly connected applications + Voicemeeter.

The other problem is that buffering size used by connected application is pending on the audio interface and api mode they used (not only the buffering they ask for) - in some cases Windows Audio system can use various size and make it work correct for an hours, and choppy for another hours... that's why re recommend per default a high internal latency (4096 or 7168) expected to work in 99% of cases.

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Posted: Wed Jun 17, 2020 6:26 pm
by jwatte
The definition of "work" may vary, though!
I'm OK with 30 milliseconds of latency. I'm OK with 100 milliseconds through the system. I'm not OK with 500 milliseconds of latency.
Lip sync must be maintained, and a visual beep in a terminal window should match up reasonably to the sound heard.

I'm not so worried about MME; all modern applications I want to use are WASAPI aware/using.

The question still remains, though: What is the mechanism used for the virtual inputs to Potato? Do they use the same technology as the VB-Audio Virtual Cables, or are they user-level graph nodes?

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Posted: Wed Jun 17, 2020 7:57 pm
by Vincent Burel
Voicemeeter with default parameters cannot generate more than 100 ms latency (usually it's around 30 to 50 ms).
your problem is surely pending on a device or application...

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Posted: Thu Jun 18, 2020 3:12 pm
by HOWIEC
jwatte wrote:The definition of "work" may vary, though!
I'm OK with 30 milliseconds of latency. I'm OK with 100 milliseconds through the system. I'm not OK with 500 milliseconds of latency.
Lip sync must be maintained, and a visual beep in a terminal window should match up reasonably to the sound heard.
3x512 (or 1536) samples @ 48kHz = 32ms of latency/buffer.