THETA V Wireless Audio Live-stream, low latency

Hello everyone!

As far as I dug into it, you could livestream video AND audio by:

The above solutions for RTSP and WebRTC have not been tested by myself, because of some errors and problems that I will refer to a next post.

Brief description of my project:
My colleague and I work on telepresence with Ricoh THETA V and real-time control of a UGV. He has the UGV part so I can not answer questions about this. My part is the real-time video and audio streaming of the camera, projecting inside a VR headset (HTC Vive).
My reasearch so far has come to solutions as this drone project with its amazing web VR viewer by @Jake_Kenin, this amazing project for RaspberryPi or this project using Python both by @Hugues and this Unity Project updated by @KEI.

All of the above, except the 1st project for Hugues which also has no audio, are using the THETA Web API utilizing the camera._getLivePreview which works with JPEG frames so if I am not mistaken, no audio is provided in the streaming.

So, considering all the APIs (CameraAPI, WebAPI, USB API) Ricoh is providing, my final question is:
How do I get audio streaming side by side with the camera._getLivePreview method, or any method with low latency?

P.S.: Sorry for the long post, but consider this a guide for all the research I have done so far.

The camera will only stream single channel audio from the microphones on the camera, not spatial audio.

If you want spatial audio, you must use a separate microphone array and use that as the audio input for your stream.

At the point the camera is at, connect it with a laptop. Use that laptop to stream the video to another Windows laptop connected to the HTC Vive headset.

There is no example of streaming spatial audio that I know of.

If you are building a drone that cannot hold a laptop or you don’t want to have a laptop on the vehicle, you can use a Jetson Nano and use that to stream to the computer that the headset is plugged into.

I have not tried this, but the Janus platform that Hugues is using has a plug-in for audio.

https://janus.conf.meetecho.com/audiobridgetest.html

You could replicate the Hugues project (which now uses RTSP, not MotionJPEG) and focus your efforts on testing the audio portion by placing a cheap microphone on the Raspberry Pi that is on the drone or UGV.