vision pro immersive world

At Rant, we’ve been building iOS apps since the very beginning – for phones, tablets and watches. Earlier this year, we were very excited when Apple announced the Apple Vision Pro – their latest product and potentially the vanguard for a whole new market sector. It’s an AR and VR headset, running on Apple’s latest platform, visionOS, and early reports suggest it offers a much more refined experience than other AR headset offerings, with impressive eye tracking and video pass through. Slightly more divisively, it looks like a pair of very expensive, oversized ski goggles with an external battery pack – perhaps not entirely ready for unobtrusive daily wear.

Developing for a new platform

visionOS is built atop Apple’s existing SwiftUI and UIKit frameworks, which we use every day to build native iOS apps for our partners. These transferable skills and common technologies make development for this new platform a lot less painful: within a few days of the visionOS SDK becoming available, our iOS team had built multiple prototype SwiftUI and UIKit applications for the headset.

Of course, other cheaper VR headsets have struggled to make much impact in the market, and the same might end up being true for Apple’s platform. The Vision Pro is an expensive ($3500 at launch), low-volume product that won’t even be available outside the US until mid-to-late 2024, making it more like a developer kit for early adopters. However, if Apple continues to develop the platform and releases subsequent lower-cost versions of the headset aimed at consumers, their software advantage could make the platform more attractive to the general public and kick start a mixed reality revolution.

Augmenting the real world

Recently, we’ve been looking into building more immersive applications and experiences for the Vision Pro, using 3D engines like Unreal Engine and Unity. Combined with Apple’s ARKit framework, these engines allow for the creation of experiences which can augment the real world with any kind of content you can imagine.

However, building that immersive world in the first place is difficult – 3D content is very time-consuming and resource-intensive to create. While there is a wealth of pre-built 3D models available, creating custom animations for them is altogether more difficult. Animating 3D models requires key frame animation, where a professional animator tweaks and re-positions the 3D model for important steps, or key frames, in the animation. This is a costly, laborious process.

Motion capture

Another option is to use motion capture – this is where an actor is filmed in a studio, often while wearing a suit dotted with sensors and markers for identifying body parts to translate real-world movements into motion data for animations. While more convenient, motion capture is expensive, requiring motion capture suits, cameras, and studio space, making it cost prohibitive for many projects without very large budgets.  Bringing the cost of motion capture down would suddenly make this exciting technology more widely available – with tantalising possibilities for all sorts of education, training and entertainment applications. And there are some new capture tools and techniques that promise to do just that.

Both Unreal Engine and Unity feature support for facial animation capture using the iPhone. In Unreal Engine, this is handled using their Live Link Face app, and with Unity, the Unity Face Capture app. These apps both use the iPhone’s front-facing camera, which captures depth data, for generating facial animations for use in their engines.

Full body motion capture is more difficult – would it be possible to replicate it in a home or small studio setting, and get quality results? There are two main options for home mocap – using a VR headset with controllers, or using a single iPhone for recording. With the goal of reducing the cost of creating quality content as much as possible, we have run some experimental capture sessions using an iPhone.

There are a number of solutions that allow for capturing motion data from a single iPhone. The first I looked into is PoseAI. This creates a live stream of data from iPhone to Unreal Engine, which can be recorded in-engine. This solution worked well, but we also wanted to explore recording animation separately from a computer running Unreal, allowing recording to occur anywhere.

Move AI

Another option we investigated was Move One, from Move AI. This app allows users to record videos of motion, upload them to be processed by Move AI, and receive that motion data back within minutes – usable in any application.

Once data is received back from the Move One app, you can import it into Unreal Engine. The quality of the animation was excellent. The animation can be tweaked and cleaned up a bit – hand tracking was good, yet sometimes missed individual finger movements.

Demonstrating a Move One animation applied to a MetaHuman

As demonstrated in the video, there’s also sometimes weird glitchy movement at the start of the animation. That said, it’s a good place to start for the animation. The next step was getting that animation transferred from the default Move AI 3D model, and onto a different model of our choice.

Unreal Engine offers the ability to create realistic human 3D models using their MetaHuman tool. Using the Quixel Bridge tool, I imported a MetaHuman character into my Unreal project, and added it to the 3D world.

When comparing the 3D model provided by Move AI and the 3D model of the MetaHuman character, there are some major differences. Both models have an underlying “skeleton” – virtual bone structures, which are moved around to animate the 3D model. The MetaHuman skeleton consists of far more bones, with many more points of articulation than the Move AI model.

Animations are designed for a specific skeleton, and they cannot be naively transferred to a different one. Thankfully, Unreal Engine provides a way to translate these animations between skeletons using a feature called Rig Retargeting. First, you create a “rig” for each skeleton, identifying chains of bones – for example, the left shoulder to the left hand. This is given a name, such as “left_arm”.

Once all chains of bones are named, the same process occurs on the second skeleton, giving the chains the same names as before. Finally, a “rig retargeter” is created. This uses the two rigs to translate animations between skeletons, by matching up the bone chains across skeletons so that Unreal can identify the similarities, and know how a movement designed for one skeleton will move the other skeleton. It’s vital that these skeletons are posed the same way inside the rig retargeter, otherwise animations won’t translate across correctly. Finally, the retargeter is able to re-export an animation designed for the first model as if it were designed for the second model. Using this technique, it’s possible to re-export a Move AI animation for an Unreal MetaHuman.

VR metahuman development visual
Creating a rig for a MetaHuman skeleton
AR development screen
Rig retargeting to translate animations to a MetaHuman skeleton

3D modelling in motion

We can record motion clips using just an iPhone, and transfer the data directly across to Unreal Engine for use on any humanoid 3D model. This reduces barriers to entry for creating custom 3D content so that all that’s needed is a single iPhone, and a computer capable of running Unreal Engine. The results aren’t as clean and polished as animations produced professionally, whether recorded using traditional motion capture set ups, or animated by hand, but they’re good for prototyping, or providing the basis for edited animations.

In conclusion

It’s very early days both for mixed reality headsets and for these low cost capture techniques – we are very excited to see how they develop, and how we can use them to build compelling AR products for our partners. We can’t wait to get those ski goggles on and give it a try.

Read more about the Apple Vision Pro here

Take a look at the AR development project by Rant

To discuss your project contact us today