Comparison matrix

Six ways to put a talking MetaHuman-style face in a web page, side by side. Pick by what you can host (server / no server), whether you need a network or API key, the fidelity you need, and how much you want to build. Each name links to its live demo.

Pipeline Where it runs Fidelity Lipsync source Needs server? Needs network? Needs API key? Effort to build Best for
01 · three.js ARKit blendshapes Client-side (browser) Medium None (manual / baked animation) No No No Low The portable core: manual expression control, presets, baked facial capture. No talking.
02 · Audio-driven lipsync Client-side (browser) Medium Audio amplitude / bands (Audio2Face stand-in) No No No Low–Medium Lipsyncing to an existing audio file or mic, no phoneme/TTS stack.
03 · TTS→viseme lipsync Client-side (Web Speech API) Medium Text → visemes (Web Speech synthesis) No No* No Medium Make the head speak typed text in-browser. *Some browsers route TTS voices via the cloud.
04 · Conversational loop Client-side (+ optional LLM) Medium STT → dialogue tree → TTS → visemes No† Yes If LLM High A full speak-back avatar: listen, decide, reply, lipsync. †Server only if you swap the tree for a hosted LLM.
05 · TalkingHead.js + Ready Player Me Client-side (browser) High TalkingHead viseme engine (TTS-driven) No Yes For TTS Medium Polished ready-made avatar + lipsync library; RPM avatars, good mouth shapes, little plumbing.
06 · Unreal Pixel Streaming GPU server → video to browser Highest Full MetaHuman rig (engine-side, any source) Yes (GPU) Yes Often Very high True MetaHuman quality streamed as video. Needs a GPU host per session; browser just plays the stream.

Live benchmark — run it on your machine

Pipeline 01 is the client-side core, so its rendering cost is what most of these pipelines inherit. This stress test clones the facecap head into a growing grid (1 → 64), animates every clone with ARKit blendshapes, and records how your FPS holds up. Watch where you cross 60 → 30 fps.

Detecting device…
Initializing…
Heads Avg FPS Triangles Draws CPU/frame Heap
What it measures: steady-state FPS, triangle count, draw calls, CPU time per frame, and JS heap as the number of independently-animated heads ramps up. Each step runs ~2 s before sampling.

Caveat: results are device-specific. They depend on your GPU, browser, devicePixelRatio, window size and thermal state — meaningful only relative to each other on this machine, not as a cross-device score.