Running a MetaHuman face in the browser

Six ways to put a talking, lip-synced face on the web — from pure client-side three.js to streaming a cinematic Unreal MetaHuman off a cloud GPU. Each is a real, runnable demo with a shared performance HUD so you can try them and measure them on your own machine.

The short version. You can't run a full cinematic MetaHuman natively in a browser — the asset and its shaders are too heavy and tied to Unreal's renderer. But you have two honest options: (A) export the head to glTF and drive its 52 ARKit blendshapes in three.js yourself (demos 01–04), use an off-the-shelf avatar library (demo 05), or (B) render the real MetaHuman in Unreal on a server and stream the video (demo 06). Demos 01–05 run entirely in your browser; 06 needs a GPU backend.
In-browser pipelines — run right now, no server
01

three.js · ARKit blendshapes

The portable core. A glTF head driven by the 52 standard ARKit blendshapes — manual sliders, expression presets, and the embedded face-capture clip. This is the foundation everything else builds on.

localclient-side
Open demo →
02

Audio-driven lipsync

Mouth shapes derived from a live audio signal (mic, file, or TTS) via the Web Audio API — the browser stand-in for NVIDIA Audio2Face. Speak and the face moves.

localmic / audio
Open demo →
03

TTS → viseme lipsync

Type text, the browser speaks it, and visemes drive the mouth in sync — the Azure / Ready Player Me pattern, here using the built-in Web Speech API so it needs no API key.

localWeb Speech
Open demo →
04

Conversational loop

The full thing, and the closest to your use case: speak → speech-to-text → a dialogue tree → reply → text-to-speech → lipsync. A face that talks back, entirely in the browser.

localChrome STT
Open demo →
05

TalkingHead.js + Ready Player Me

The off-the-shelf route: a polished open-source avatar library and a Ready Player Me character with built-in lipsync, moods, and gestures. What you ship when you don't want to build the pipeline.

networkCDN + RPM
Open demo →
06

Unreal Pixel Streaming

The only path to true MetaHuman fidelity on the web: Unreal renders the real thing on a cloud GPU and streams the video over WebRTC. Explainer, provider comparison, cost estimator, and a live connection tester.

servercloud GPU
Open page →
Fidelity & realism — pushing three.js toward cinematic
07

Post-processing + HDRI lighting

The biggest perceived-quality lever: image-based lighting plus a post stack (ambient occlusion, bloom, depth of field, antialiasing). Toggle each pass and watch the jump from "raw WebGL" to "rendered."

localthree.js post
Open demo →
08

Realistic skin + wet eyes

Shading turns a plastic head into believable skin: fake subsurface scattering, sheen, detail normals, and clearcoat catchlights in the eyes. A/B against the flat material. Skin SSS is the real MetaHuman moat.

localPBR / SSS
Open demo →
09

Micro-motion — the “alive” layer

The behavioral half of realism, pure JS: blinks, saccades, breathing, idle sway, cursor gaze, and co-articulated speech. Toggle "dead" vs "alive" — this sells presence more than polygons do.

localbehavior
Open demo →
10

WebGPU + TSL

The next-gen upgrade path: three.js's WebGPU renderer with a node-based (TSL) material on the head. Compute, better materials, more headroom — the direction the gap keeps closing.

WebGPUnode materials
Open demo →
11

Gaussian splatting

The photoreal, non-Unreal path: render a real captured scene as 3D Gaussian splats. Captured reality, not mesh+shader. Animatable head splats are the frontier for a photoreal talking face without an engine.

networkphotoreal
Open demo →
12

Hi-fi head — everything stacked

The synthesis: lighting, post, skin, eyes, and micro-motion on one head, each independently toggleable with the perf HUD. A/B "raw WebGL" vs "full cinematic" and measure the fps cost of every layer on your hardware.

localall layers
Open demo →
13

Head gallery — real heads

Swap between five real heads — two photoreal (Avaturn, Avatar SDK), Ready Player Me, the high-detail Lee Perry-Smith scan, and the neutral scan — driven by one rig. Expressions, audible lipsync, and micro-motion on every face. Drop in your own MetaHuman GLB.

localphotoreal heads
Open demo →
Evaluate
📊 Comparison matrix + live benchmark 📄 README