TL;DR you're right, and there is nothing you can do to fix this. It's a sound card driver problem, not a Voicemeeter problem.
You will notice it almost only in Voicemeeter, but also in properly configured recording studio type apps, because they behave differently to most apps as they are not just sending audio (like a music/video player does) but have multiple audio interfaces co-existing and co-operating in both input and output directions.
I will go into some more detail here, because this topic comes up a lot. This is also for the future search engine users..... There are some minor technical inaccuracies here, they are intentional and serve the purpose of keeping it simple (well...simple-ish!). Any big nerds like me with more time than me, are welcome to embellish and correct further details, but this should do for now:
You got it. The 1056 buffer size and the 'S' next to the sound card name are telltale signs here. They tell us that the device is operating in 'shared mode' (S), and malfunctioning (wrong buffer size). This is a very common driver problem. Searching these forums for '1056' will find many examples for you, searching for 'shared mode' will find more. You will see my name a lot, because I tried to do this a long time ago and have been helping people with it ever since
![Wink ;)](./images/smilies/icon_e_wink.gif)
Realtek sound cards are most common, so you should pay attention to threads about those, even though yours is not realtek, the problem is the same.
The problem starts with the fact that the device is not actually capable of handling so many channels. Instead, it is compressing the n-channel audio (eg 5.1=6 channels) into a digital signal
representing 6 channels, but that signal is transmitted over the TOSLink/IEC958 connection. In reality, TOSLink is stereo-only, it only has enough bandwidth (ie bits per second) for two channels. It's just too slow for anything more....
But it's digital stereo, and since a digital signal is just a stream of 1's and 0's and can be interpreted any way we like, the receiver can receive a stream of compressed data, and instead of treating it as audio formatted data (as we think of it being done), it treats it as compressed data (like a zip file), decompresses it (like unzipping), and the result is n-channel audio (such as 5.1). Really, we could even send GIFs of kittens or an exe file or anything digital. over TOSLink audio cables, if we wanted to, and the receiver knew how to handle such a data format. In the case of multi-channel audio, it is like an MP3 (and yes, it is lossy like an mp3!!). We take a big audio (wav) file, compress it down so it is smaller in size, then decompress it back to something like the original. This is how Dolby/DTS/etc all work.
The real trouble comes down to how the sound cards implement this functionality. They present the OS and apps with a virtual multi-channel device, such as a 5.1 or 7.1 output, and then the software (audio player, voicemeeter, etc) sends out multiple channels (let's say 6, for 5.1) of audio, then the sound card uses SOFTWARE (in caps because this is the problem!) to take those 6 channels, compress them into a data stream capable of transmission over stereo TOSLink, and then sends that signal out of the fibre optic (or coax) output.
Because the system has to use software to do the compression, that software becomes part of the required signal chain for that device. Because of this, the device cannot run in exclusive mode, where the app (audio player, VoiceMeeter, etc) writes directly to the sound card, instead it writes to the software that does the compression, which in turn writes the compressed audio data to the sound card.
That layer which sits inbetween the app and the actual hardware, that's what's broken. It refuses to run as instructed (because it can't!) and refuses to set the buffer size commanded to it (because it can't!) and VM tries to overcome this and the sound card driver fights it and the result is a horrible or even simply not even working audio stream.
The reason you see stereo when using MME, is that MME does not even support this software-driven multi-channel compression approach at all, and locks you down to the actual capabilities of the hardware; stereo.
The only way I am aware, to get actual surround audio out of the PC, is to use discrete multichannel cards (all the coloured audio jacks on the back of your PC/pro audio devices) or HDMI (which actually does have enough bandwidth for all that audio, since it's designed for carrying multichannel audio from the start, as well as the much more demanding video signal) or possibly displayport (never tried to be honest, but it should work!).
Although in theory it is entirely possible for a sound card to support compressed multichannel audio over IEC958/TOSLink in hardware, I've never ever seen anything that does. They all use the software approach. You will be able to use the software-compressed approach, for apps that do not expect accurate timing and acceptable delays, such as audio/video players, but for accurate timing, low-delay interaction with other audio interfaces, which apps like VM imply, you will have to use a more appropriate audio interface such as HDMI.