Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

The Virtual Audio Mixer discussions and support...
Post Reply
jwatte
Posts: 2
Joined: Mon Jun 15, 2020 8:58 pm

Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Post by jwatte »

The question is in the last paragraph, but may need some background, so here we go!

I'm running Virtual Audio Cable with 3x256 buffers, and it's doing well. (I haven't tried 3x128 -- I don't need to go to that extreme for my desktop audio.)
In Virtual Inputs in Potato, however, if I have a buffer size less than 2048, the virtual audio inputs glitch out.
My computer (running Windows 10) is fairly high spec, and generally I can run audio programs with small buffers.

My assumption is that the VAC installs a driver, that uses the sound miniport driver, to deliver data written from the "input" (or empty on a timer) to the output.
My further assumption is that the "virtual inputs" in Potato instead are application-level DirectShow endpoints.

My current best theory for this behavior is that the DirectShow graph simply can't deal with smaller buffers in a timely manner, whereas VAC, using the kernel-level infrastructure, can.

If that theory is correct, then what is the fundamental limitation here? Is it scheduling of the sound processing in Potato? Is it scheduling in the applications that would play "into" these virtual channels? Is it in the DirectShow graph infrastructure itself? Is it something else?
Vincent Burel
Site Admin
Posts: 2008
Joined: Sun Jan 17, 2010 12:01 pm

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Post by Vincent Burel »

with our virtual audio cable the latency limitation is given by the both applications connected on the input and the output of the cable.

For example if the player application connected to input is using 256 samples buffer and if the application connected on output is capturing 512 sample buffers. The internal needed latency will be around (2x 512 + 256) samples. let's simplify to 3x 512 to be sure of the stream continuity. (if sampling rate is the same everywhere). this is explained in detail in this documentation: https://www.vb-audio.com/Cable/VBCABLE_ ... ttings.pdf

When our virtual audio cables are used as virtual I/O by Voicemeeter the required max latency is pending on 3 x applications: the 2x possibly connected applications + Voicemeeter.

The other problem is that buffering size used by connected application is pending on the audio interface and api mode they used (not only the buffering they ask for) - in some cases Windows Audio system can use various size and make it work correct for an hours, and choppy for another hours... that's why re recommend per default a high internal latency (4096 or 7168) expected to work in 99% of cases.
jwatte
Posts: 2
Joined: Mon Jun 15, 2020 8:58 pm

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Post by jwatte »

The definition of "work" may vary, though!
I'm OK with 30 milliseconds of latency. I'm OK with 100 milliseconds through the system. I'm not OK with 500 milliseconds of latency.
Lip sync must be maintained, and a visual beep in a terminal window should match up reasonably to the sound heard.

I'm not so worried about MME; all modern applications I want to use are WASAPI aware/using.

The question still remains, though: What is the mechanism used for the virtual inputs to Potato? Do they use the same technology as the VB-Audio Virtual Cables, or are they user-level graph nodes?
Vincent Burel
Site Admin
Posts: 2008
Joined: Sun Jan 17, 2010 12:01 pm

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Post by Vincent Burel »

Voicemeeter with default parameters cannot generate more than 100 ms latency (usually it's around 30 to 50 ms).
your problem is surely pending on a device or application...
HOWIEC
Posts: 12
Joined: Mon Apr 30, 2018 5:44 am

Re: Why does Potato Virtual Input need more buffering than Virtual Audio Cable?

Post by HOWIEC »

jwatte wrote:The definition of "work" may vary, though!
I'm OK with 30 milliseconds of latency. I'm OK with 100 milliseconds through the system. I'm not OK with 500 milliseconds of latency.
Lip sync must be maintained, and a visual beep in a terminal window should match up reasonably to the sound heard.
3x512 (or 1536) samples @ 48kHz = 32ms of latency/buffer.
Post Reply