# MetaHuman in the browser — pipeline demos

Six runnable ways to put a **talking, lip-synced face on the web**, from pure client-side
three.js to streaming a cinematic Unreal MetaHuman off a cloud GPU. Each pipeline is a real
demo with a shared performance HUD, so you can try them and benchmark them on your own machine.

> **The honest summary:** you can't run a full cinematic MetaHuman *natively* in a browser —
> the asset and its shaders are too heavy and tied to Unreal's renderer. Your real options are
> **(A)** export the head to glTF and drive its 52 ARKit blendshapes yourself in three.js
> (demos 01–04), or use an off-the-shelf avatar library (demo 05); or **(B)** render the real
> MetaHuman in Unreal on a server and stream the video over WebRTC (demo 06). Demos 01–05 run
> entirely in the browser; 06 needs a GPU backend.

## Run it

Pure static site, **no build step**. Either:

- Open via the local server: **http://localhost/org/jonasjohansson/metahuman-browser-demos/**
- Or serve the folder yourself: `python3 -m http.server 8000` then open `http://localhost:8000/`

⚠️ Must be served over **http://** (not `file://`) — ES modules, the GLB, and the microphone
all require a real origin. Demos **04** (speech-to-text) and **05** work best in **Chrome**.

Start at `index.html` (the hub) or `compare.html` (matrix + live benchmark).

## The six pipelines

| # | Demo | Runs | Lipsync source | Needs |
|---|------|------|----------------|-------|
| 01 | **three.js · ARKit blendshapes** | browser | manual / animation clip | nothing |
| 02 | **Audio-driven lipsync** | browser | live audio amplitude/spectrum (Web Audio) | mic or audio file |
| 03 | **TTS → viseme lipsync** | browser | Web Speech API + viseme map | nothing |
| 04 | **Conversational loop** | browser | STT → dialogue tree → TTS → visemes | Chrome (mic) |
| 05 | **TalkingHead.js + Ready Player Me** | browser | library lipsync / Web Speech | network (CDN + RPM) |
| 06 | **Unreal Pixel Streaming** | cloud GPU | the real MetaHuman, streamed | GPU server |

**01** is the foundation — the same 52 ARKit blendshapes a MetaHuman exposes, driven directly.
**02** is the browser stand-in for NVIDIA Audio2Face. **03** is the Azure/RPM TTS-viseme pattern.
**04** is the closest to a conversational MetaHuman: *speak, it answers, the mouth moves*.
**05** is the ship-it-today route. **06** is the only path to true cinematic fidelity on the web.

## The fidelity track (07–12) — pushing three.js toward cinematic

"Is Unreal the only way to look good?" No. These demos show how far in-browser three.js can go,
and where the real ceiling is. Each isolates one technique so you can A/B and measure it.

| # | Demo | What it shows |
|---|------|---------------|
| 07 | **Post-processing + HDRI lighting** | Image-based lighting + GTAO, bloom, depth of field, SMAA. Toggle each pass — the biggest perceived-quality jump for the least effort. |
| 08 | **Realistic skin + wet eyes** | Tuned PBR skin: fake subsurface scattering, sheen, detail normals, clearcoat eye catchlights. Skin SSS is the real MetaHuman moat. |
| 09 | **Micro-motion** | The behavioral half: blinks, saccades, breathing, idle sway, cursor gaze, co-articulated speech. Sells "alive" more than polygons. |
| 10 | **WebGPU + TSL** | three.js's WebGPU renderer + node materials — the next-gen path that keeps closing the gap. |
| 11 | **Gaussian splatting** | The photoreal, non-Unreal path: render captured reality as 3D splats. Animatable head splats are the frontier. |
| 12 | **Hi-fi head (everything stacked)** | All layers on one head, each toggleable, with the perf HUD. A/B raw vs cinematic and measure each layer's fps cost. |
| 13 | **Head gallery (real heads)** | Swap between four real ARKit-rigged heads — two photoreal (Avaturn, Avatar SDK), Ready Player Me, and the neutral scan — all driven by one rig. Expressions, lipsync, and micro-motion on every face. |

### More real heads

Demo 13 ships **five** heads in `assets/` + `assets/heads/`:

- **Avaturn** and **Avatar SDK** — photoreal, generated from a photo (full ARKit + visemes, lip-sync).
- **Ready Player Me** — the stylized avatar (full ARKit + visemes).
- **Lee Perry-Smith** — the high-detail scan from three.js's classic skin demo. Gorgeous, but a
  *static* scan with no blendshapes, so it's a shading showcase: it won't emote, but it speaks aloud
  and you can rotate it. Its color/normal/specular maps are applied on a skin material in code.
- **Face scan** — the neutral facecap.glb.

A shared rig (`lib/rig.js`) drives the rigged ones from the same weights, bridging the two ARKit
naming conventions (`eyeBlinkLeft` vs `eyeBlink_L`) and the single-mesh vs multi-mesh difference.
**"Speak a line" is audible** — it uses the Web Speech API for the voice and drives the visemes in
parallel (same in demos 09 and 12). To add your own head (an exported MetaHuman, a Ready Player Me /
Avaturn / Avatar SDK export), drop the GLB in `assets/` and add one line to `lib/heads.js`.

### Exporting a MetaHuman → web GLB

`tools/metahuman_to_web_glb.py` is a Blender script that imports an Unreal MetaHuman FBX, renames
the facial shape keys to the ARKit convention if needed, decimates to a web LOD, and exports a GLB
with morph targets. Verified end-to-end: a real MetaHuman head (head + teeth + eyes, **52 ARKit
blendshapes**, 59k tris → 9 MB GLB) imports, exports, and drives in the gallery. The test asset
itself is licensed "study purposes only," so it is **gitignored, never committed** — point the
script at your own licensed MetaHuman export and add it to `lib/heads.js`. Note the export carries
geometry + blendshapes; MetaHuman skin textures bake separately (they're Unreal-specific).

**Where it lands:** a tuned three.js head reaches "strong real-time game cinematic." The last ~15%
(true subsurface skin, strand hair, Lumen-grade GI) is what Pixel Streaming a real MetaHuman (06)
buys you, at the cost of a GPU server per viewer. Demos 10 (WebGPU) and 11 (splatting) are the two
directions that ceiling keeps rising along.

## How to use your own MetaHuman

The 3D head in demos 01–04 is three.js's `assets/facecap.glb`, chosen because it carries the
**same 52-name ARKit blendshape standard** that MetaHuman, Ready Player Me, and iPhone ARKit all
share. To swap in your MetaHuman:

1. In Unreal, export the MetaHuman head as **glTF/GLB** (or FBX → Blender → GLB), keeping the
   ARKit facial pose / blendshapes. Decimate to a web-friendly LOD.
2. Drop it in `assets/` and point `loadGLB(...)` at it. The blendshape-driving code
   (`lib/arkit.js`) looks shapes up by name, so anything ARKit-named just works.
3. Expect to lose the heavy skin shaders and strand hair — that fidelity is what demo 06 (Pixel
   Streaming) exists for.

## Repo layout

```
index.html              hub / landing page (the six cards)
compare.html + .js      comparison matrix + live in-browser FPS benchmark
assets/facecap.glb      head model with the 52 ARKit blendshapes
lib/
  styles.css            shared light theme (tokenized :root vars)
  scene.js              createStage() + loadGLB() — shared three.js setup
  arkit.js              52 ARKit names, expression presets, viseme→blendshape maps
  perf-hud.js           PerfHUD — FPS, frame ms, triangles, draw calls, heap, GPU
  ui.js                 DOM helpers (topbar, panel, sliders, buttons, status)
demos/
  01-threejs-blendshapes/
  02-audio-lipsync/
  03-viseme-tts/
  04-conversational/      (+ tree.json dialogue tree)
  05-talkinghead-rpm/
  06-pixel-streaming/
```

Each demo folder has a `NOTES.md` with its specifics, perf characteristics, and limitations.

## Evaluating performance

Every in-browser demo shows the shared **PerfHUD** (top-left): FPS, CPU frame time, triangle
count, draw calls, JS heap, and the GPU name where the browser exposes it. `compare.html` runs a
**stress benchmark** — it clones the head into 1→64 animated instances and graphs where your
machine drops from 60→30 fps, so you can judge how much avatar your target hardware can carry.

## Tech notes

- three.js loaded from the jsDelivr CDN via an import map (r160 for demos 01–04 / compare,
  r170 for demo 05 to match TalkingHead 1.4).
- No bundler, no npm install, no framework — plain ES modules.
- Demo 05 fetches the TalkingHead library and the Ready Player Me model over the network; the
  others are fully self-contained once `facecap.glb` is present.