Hi Guys, May I ask two simple questions about the audio of Ricoh Theta V?

Well, I carried out an experiment about audio-visual interaction using Ricoh Theta V. However, I have two questions.

1, Are the 4 built-in microphones in Theta V the 1st order ambisonic microphone? Is there anyone know their model, parameter. etc?

2, The video files I got is mp4, which is mono channel audio. After I using Ricoh Theta software on Windows, this file is converted to another mp4 file named “***_er.mp4”. This is also a mono channel audio file. After using Ricoh Theta Movie Converter, it is converted to a mov file. This time, it is a 4 channels audio file. My question is, What is that mono channel audio? One of the 4 channel signals, for example, W? Or the mix of all 4 channel? If in this case, how can it mix these 4 channels signals?

update
I found although the “**_er.mp4” is a mono audio file, it is also spatial audio when listening via Ricoh Theta software on Windows. This means this mp4 file contains spatial audio information. But when I look over the detail of this mp4 file using right-click, it is shown as mono channel. And if I play this mp4 file via other media player, Pot Player, for example, it is also mono audio. So what I heard when I play this mp4 file via Pot player? A mixture of all 4 channels? How mix?

Just update what I found.

1, These 4 built-in microphones shoule be MEMS microphones, which are very popular in smart phone. The model may be AKU2001(Akustica), or NJD3002(New Janpan Radio (NJR)). Actually ,what I want to know is the frequency response character and accuracy.

2, I have confirmed that the mono signal is the W channel signal.
I compared this mono signal in mp4 file to all of 4 channel signals in mov file, and find that this mono signal is exactly the same with one of the 4 channel signals, which is w channel, except the level is 3 dB lower.
I remenber someone here also mentioned there are indeed 3 dB level lower of W channel than the actual sound level.

So I assume that: the original video file of Ricoh Theta V contains 2 kinds of audio information, the mono channel audio signal and 4channel B-format ambisonics audio signals. When this original file is played by Ricoh Theta software, the ambisonics audio signal is read (that is why we can hear spatial audio via the Ricoh Theta software) ; while when this file is played via PC media player, Pot Player, for example, only the mono audio signal is read, maybe because the player doesn’t support spatial audio palying. This is why when I right-click the mp4 file and it expresses mono audio.

I am not familiar with the different Ambisonics standards.

I think it is ambisonic B format based on this.

image

The RICOH THETA V features multiple built-in omni-directional microphones which compose directionality from recorded audio sources, creating the four WXYZ signals.

Source: RICOH

At the RICOH link above, there is also some information about HRTF, which might be related.

image

I think the thread here might have the information you are looking for.

This is a relevant piece of information from @Angelo_Farina

Yes, the new firmware 1.20.1 and the new Movie Converter app create a .MOV file containing a correct 4-channels spatial audio soundtrack in Ambix format (ACN/SN3D), with correct channel ordering (WYZX) and correct gains.
The video posted in Youtube here above shows that everything is OK, now!
The issue is fixed, albeit the firmware is still saving an MP4 file which does now show explicitly the 4-channels spatial audio, hence the passage through Movie Converter is still required for extracting the hidden 4-channels soundtrack.
One would expect that the file is directly saved with spatial audio soundtrack inside the camera when the user selects the camera-stitching mode.
On the other side, if in-camera-stitching is not used, the spatial audio extraction shopuld occur inside the main Theta S program on the computer, without the need of a secondary action using the Mobie Converter app.
But for now the workflow is operational, albeit still requiring the usage of the Movie Converter program both for in-camera-stitched videos and for unstitched videos (which need to be stitched BEFORE using the Movie Converter app).


This article might also be of interest.

I can just add that the MP4 file created by the Ricoh Theta V contains explicitly only one audio stream, mono, carrying the W signal. the other three Ambisonics channels (YZX) are hidden in a very tricky way inside the MP4 container. I attempted to extract them in many ways and different software, but everything failed. They are in some way “encrypted”.
It is not a fault of the player: for example the latest VLC Mediaplayer fully understands Ambisonics audio streams up to third order (16 channels).
The fact is that Ricoh does not want people to access these spatial audio channels. The only way of “decoding” them from the encrypted MP4 file is to pass through the Movie Converter app, which extracts the hidden channels YZX and creates a new MOV container with a 4-channels Ambix stream.
Unfortunately the output is 16 bits, whilst the original recording was 24 bits.
This is one of the reasons for which I attempted to extract myself all 4 channels form the original MP4 file.

2 Likes

Thank you for your sharing.

So I suppose that when we play video via Ricoh Theta software, the 4 channels of output should be 24bits.

Many thanks.

Just update what I found.

1, These 4 built-in microphones shoule be MEMS microphones, which are very popular in smart phone. The model may be AKU2001(Akustica), or NJD3002(New Janpan Radio (NJR)). Actually ,what I want to know is the frequency response character and accuracy.

2, I have confirmed that the mono signal is the W channel signal.
I compared this mono signal in mp4 file to all of 4 channel signals in mov file, and find that this mono signal is exactly the same with one of the 4 channel signals, which is w channel, except the level is 3 dB lower.
I remenber someone here also mentioned there are indeed 3 dB level lower of W channel than the actual sound level.

So I assume that: the original video file of Ricoh Theta V contains 2 kinds of audio information, the mono channel audio signal and 4channel B-format ambisonics audio signals. When this original file is played by Ricoh Theta software, the ambisonics audio signal is read (that is why we can hear spatial audio via the Ricoh Theta software) ; while when this file is played via PC media player, Pot Player, for example, only the mono audio signal is read, maybe because the player doesn’t support spatial audio palying. This is why when I right-click the mp4 file and it expresses mono audio.