The camera will only stream single channel audio from the microphones on the camera, not spatial audio.
If you want spatial audio, you must use a separate microphone array and use that as the audio input for your stream.
At the point the camera is at, connect it with a laptop. Use that laptop to stream the video to another Windows laptop connected to the HTC Vive headset.
There is no example of streaming spatial audio that I know of.
If you are building a drone that cannot hold a laptop or you don’t want to have a laptop on the vehicle, you can use a Jetson Nano and use that to stream to the computer that the headset is plugged into.
I have not tried this, but the Janus platform that Hugues is using has a plug-in for audio.
https://janus.conf.meetecho.com/audiobridgetest.html
You could replicate the Hugues project (which now uses RTSP, not MotionJPEG) and focus your efforts on testing the audio portion by placing a cheap microphone on the Raspberry Pi that is on the drone or UGV.