So, what are the barriers to making this kind of thing work?
I can give you at least three things. One is that nonverbal expressions, like nodding your head or leaning toward somebody, don’t work very well yet. VR headsets still don’t capture them. Headsets are actually terrible.
A second one is 3D, spatial audio, which is what we’re working on at High Fidelity. You have to be able to hear everybody. You can’t sit and have a productive group conversation unless you hear people’s voices coming from the place around the table where they are, because that’s what enables everybody to talk at the same time, like in a cocktail party.
And then another one is having a lot of people in the same place. There aren’t yet any technologies that enable there to be more than, say, 100 people in the same place at the same time. And many, many human experiences, like a big freshman class, a music concert, or a political debate, require more than 100 people in the same place. Facebook’s product Horizon Worlds, which is the closest thing they’ve got to the metaverse right now, can’t have more than 20 people in a space. That’s just not enough.
So you have to be able to have a lot of people in the same place. You have to have visually expressive avatars. And you have to have spatial audio. And then beyond that, you need the right kind of bottom-up systems for governance and moderation. Because the systems that we have today for things like Facebook or Reddit, they’re not applicable to embodied environments in digital space.
What’s the alternative to the headset as the hardware that enables this?
Your phone. The mobile device, with its forward-looking camera, detecting you and turning you into an avatar and putting you into the world. You don’t need to put the headset on.
When you look at VR headset use, you’re mixing up two different things that are both really cool. One is visual and sensory immersion in space, your ability to have a wider field of view and to look behind you and stuff. That’s awesome.
The other one, though, is being able to communicate to people near you, for example by nodding your head. That can be done using a forward-looking camera or a webcam on a desktop computer. You don’t need to put a headset on for that. I can track your face and animate your avatar with it. And in fact, if you’re not wearing a VR headset, I can see your whole face with the camera. So the optical tracking and AI stuff that you can use to detect people’s faces, they work better if you don’t have a VR headset.
With VR headsets, we’re more than five years away, in my opinion. They still, 25 or 30 percent of the time, make us nauseous. And there’s actually no R&D solution to that yet. The problem has to do with the difference between your vestibular sense of motion and what your eyes see. If you make those two disagree, a significant percentage of people get sick and they always will.
But I think the more nuanced thing is that the VR headset is very divisive. If you put a bunch of randomly chosen people in a room and ask them who’s comfortable basically putting on a blindfold in front of other people, you are going to get a biased outcome, where big white men, for example, are going to be comfortable putting a VR headset on because they would also be comfortable blindfolding themselves in front of other people. But that’s not true for everybody.