TPAC 2025
Kobe, Japan & online
10–14 November 2025
Notes.
Today's Agenda
The Problems We're Facing (and What We Currently Have,
Why That's Not Enough)
What Solutions or APIs Are Required to Solve These
Problems
Discussion and To-Dos
Real-World Practices
Notes.
1. The Problems We're Facing (and What We
Currently Have, Why That's Not Enough)
Notes.
Big Problem 1 of 3: AI and Next-Gen Personal Computing Are
Reshaping Apps, While Web Is Falling Behind
What's Happening?
Notes.
What's Happening?
Because AI is in such high demand, the next-gen personal computing
devices and OS need much better multimodal interaction.
Notes.
AI Glasses
Why is the next-gen personal device for AI basically a spatial computing
device?
AR Glasses
Natural language fits coarse-grained interaction tasks.
Fine-grained interactions and intuitive feedback still require a
GUI.
GUI needs to integrate
both with the context of AI interactions and with the surrounding
physical space.
GUI is moving beyond screens.
This image shows the key point - it highlights why AI demands more from
the web and introduces he emergence of spatial GUI environments.
MR Goggles (MR Headsets)
XR headsets are evolving to align with AR glasses, with
more full-featured third-party apps and
better spatial integration.
AR glasses replace phones,
MR goggles replace laptops/tablets.
More well-rounded and near-term app needs we’re about to face.
All apps - not just 3D games - need to shift to spatial use
cases
The 2D app ecosystem needs to be preserved and continued.
2D GUIs still fit best in most cases, but
parts of their content, details, and how multiple 2D GUIs
work together
need to break free from the limits of flat screens.
Notes.
Notes.
Split the Web Page, Free the UIElevate HTML Elements, Unlock DepthMultiple Scene Containers, Native PowerAdd True 3D Content, Blend Dimensions
Notes.
Next-gen operating systems have a far greater need than mobile OSes for
"install-free apps"
based on open standards.
* "Install-free apps" are large-scale and hard to catalog; they launch
via a link, run on demand, are disposable by default, and can be
upgraded to installed apps when needed
Notes.
Why
Client-side AI agents are increasingly choosing "tools" on their own.
The kinds of apps for "Tool Use" tend to be
vast, unknown, and infrequently used.
That means they are unsuitable to be
pre-installed, installed on the spot, or retained on the device
afterward.
App discovery and launching from spatial environments work the same
way
Just like how people in China and Japan scan QR codes to join
events or place orders.
Notes.
The sole "super app" of the desktop era - the browser - is making a
comeback, but in new forms
ChatGPT app
Chat boxes are replacing address bars.
Message feeds are replacing tabs.
TikTok/Snapchat camera
XR seethrough views are replacing the address
bars.
Window containers with spatial layout are replacing tabs.
Notes.
"Install-free apps" are not inherently bound to the Open Web
We already saw this split happen in China's mobile internet
market, where
non-standard "mini-app" ecosystems inside super apps like
WeChat haven't just taken over most native app needs, but have also
pretty much wiped out the Open Web in China.
Notes.
What's the Problem
Apple and Google are
extending their native 2D GUI frameworks and platform-specific app
ecosystems
to support experiences that go beyond screens, blend with spatial
environments, and still meet mainstream app needs, whereas the Open Web
today cannot satisfy these demands concurrently.
visionOS
Compatible iPhone and iPad apps -> visionOS apps
SwiftUI + RealityKit + ARKit
Android XR
XR compatible large screen app -> XR differentiated app
Jetpack Compose + SceneCore + ARCore
Switch to Siyaman
The Web might once again fall way behind native apps
The web fell behind during the paradigm shift
from desktop to mobile.
If the status quo persists, as AI/AR glasses and visionOS/Android XR
devices take off, web devs will be forced to switch stacks - moving to
native 2D GUI stacks or to hybrid ecosystems like
React Native (React Native visionOS) / Mini-apps that diverge from the mainstream web.
Native
Development outcomes accumulate in
closed, platform-exclusive walled gardens.
Native / React Native
Loss of the web's core advantages, like URLs, no-install,
on-demand access.
Mini-apps
It neither inherits from nor integrates with the
existing web ecosystem, it starts anew.
Notes.
Big Problem 2 of 3: Mainstream Web Stack Lacks
New UI Capabilities for Spatial Apps
What's Happening?
Switch to Siyaman
Paradigm shift in XR OS:
From Compositor-based Architecture to
Unified Rendering
Architecture
Through the visionOS platform, Apple pioneered a Unified Rendering
architecture and a Shared Space model for multi-app coexistence, setting
a new standard in the industry.
Notes.
Shared Space
Notes.
Unified Rendering
Notes.
To meet mainstream needs and general-purpose use cases, the OS should
default to multitasking - multiple apps coexisting
Each app handles just a portion of the display/space instead of
taking over the whole thing.
Allow quick switching and combined use.
To integrate with and fully leverage spatial environments, coexisting
apps need to
go beyond 2D windows and integrate into one shared space
Rather than existing solely as overlays/HUDs
3D/Spatialized content from different apps shares spatial
positional relationships, is affected by the same spatial
environment (e.g., lighting, frosted-glass background), and
enables consistent spatial interaction.
These coexisting apps aren't 2D apps or immersive apps anymore;
they're called "spatial apps".
They can
mix 2D content and content with spatial relationships or 3D
volume, and they can still all share the same space.
Notes.
With visionOS, Apple has established industry design patterns for
spatially 2D+3D hybrid GUIs:
The OS is responsible for implementing and managing
spatial scene containers - Windows for GUI,
Volumetric Windows for simulating objects, or fully immersive
spaces.
Each running app must place its content inside these containers to
integrate with the same space.
Apps can't control these containers directly; they can only
provide their desired configurations to the OS during container
initialization
(like type, default size, resize constraints)
Windows and Volumes are like bounding boxes, and all 2D/3D content
sticks to the box's back face and is treated as
2D frames managed by a 2D layout system, but can
move or morph along the Z-axis when needed.
Some of these 2D frames are 3D content containers, rendering
volumetric content in the space in front of and within their bounds.
Notes.
Notes.
Spatial apps cannot render themselves in isolation. The OS should
render them uniformly
which means it needs to understand what the apps contain, instead
of compositing pre-rendered, information-losing frames.
Spatial apps cannot implement arbitrary low-level interactions
independently (e.g., rendering the user's hands). The OS should
provide unified visual cues (such as hover effects) and hit
testing, while apps just receive and handle gesture events.
With visionOS, Apple has established industry design patterns
for natural interaction:
Eye-hand–based indirect interaction (selection by
gaze and confirmation via finger gestures)
Touch-based direct interaction
Both support
basic spatial gestures like drag, rotate, and zoom.
Notes.
Notes.
What We Currently Have for Spatial UI and Unified Rendering
WebXR
HTML/CSS/DOM
Spatial Browsing
PWA
Switch to Ruoya
The current Immersive Web Working Group focuses on the
WebXR API
standard.
Switch back to Siyaman
Why We Should Do More
Like OpenXR, WebXR API takes over the XR device's full stereo view and
the entire space, renders on its own using low-level 3D graphics APIs
(WebGL/WebGPU), submits only final frames to the OS compositor, and
requires building natural interaction from scratch.
WebXR sessions
can't coexist with their host page window or other app windows.
WebXR sessions can't render only part of a Shared Space, and
multiple WebXR sessions
can't blend into the same Shared Space.
3D graphics APIs like WebGL/WebGPU, which work on the final frame
pixel by pixel instead of describing spatial content,
may fundamentally conflict with the Unified Rendering
architecture.
Notes.
What We Currently Have for Spatial UI and Unified Rendering
HTML/CSS/DOM are screen-based by now, even without a
screen.
Notes.
Why Screen-based HTML/CSS Isn't Enough
Current layout systems in HTML/CSS/DOM only support the X and Y
axes.
z-index is about stacking order, not a
Z-axis API.
CSS Transform API can have a Z axis, but only affects appearance after
2D projection.
Notes.
Why Screen-based HTML/CSS Isn't Enough
All HTML elements are just flat 2D panels with no volume.
Web 3D content ultimately projects onto a 2D canvas plane.
Notes.
Why Screen-based HTML/CSS Isn't Enough
Only fixed, solid colors are available, and CSS styles can only
be manually authored based on static device states (media queries)
Element backgrounds and text colors cannot
dynamically track the surrounding environment, making them
unsuitable for the complexity of the real world (coexisting hues and
light/dark conditions) and for next-gen platforms where app backdrops
change continuously.
Notes.
Why Screen-based HTML/CSS Isn't Enough
Existing web window–related APIs are insufficient for spatial
scene containers.
Opening a new page window via <a> allows no
initialization configuration.
When a new page window is opened via `window.open`, the configuration
options control the exact window size and resize permission, which is
not the same as
the initialization semantics of spatial scene containers.
If it runs as an installed PWA (a standalone app with its own window,
outside the browser), you also can't set any
initial options for the first page window.
Web page windows don't have a type concept, so they can't support
new spatial scene container types like Volume.
Even standalone PWA windows have a fixed, solid background and visible
border, preventing UI elements from
appearing spatially separated.
Notes.
Why Screen-based HTML/CSS Isn't Enough
Supports only
low-level JS interaction events based on 2D positions, without
natural gestures or spatial position tracking.
Current JS interaction events (e.g., Pointer Events) are very
low-level, and even basic gestures have to be implemented in JS.
It's hard to build a great natural interaction experience just from
low-level events.
Spatial OSes can't freely expose sensitive underlying interaction data
- like eye tracking - to apps because of privacy concerns.
Spatial OSes need to ensure consistent interaction in the Shared Space
by uniformly implementing core natural interaction gestures.
Notes.
The spatial web features added to Safari on visionOS
Notes.
The spatial web features added to Safari on visionOS
Notes.
Why We Should Do More
Model element
The <model> element can only render volumetric 3D
content inside a "hole" in the page plane. It can't appear in
the space in front of the page like a native SwiftUI app's Model3D view.
The 3D content in the <model> element can only come
from pre-made 3D model files; you can't
program it dynamically (no mainstream Web 3D engine features)
Notes.
Why We Should Do More
Immersive Media
Like a WebXR session, you have to call the Fullscreen API to switch into
a special mode to view spatial photos and videos, instead of
viewing that spatial content right in the webpage window.
Notes.
Spatial Browsing
The Spatial Browsing capability introduced in Safari for visionOS 26
doesn't introduce or rely on new Web APIs, so it's merely an
app-level, reader-mode-like feature available only on a
limited small set of qualifying, article-centric pages.
Notes.
Why We Should Do More
Spatial Browsing-like auto-conversion of 2D pages into spatial
UIs
Limitation: Without new Web APIs for expressing spatial intent, we're
stuck recognizing a few known patterns in 2D pages, and only in
matched cases can the UI be spatialized automatically.
Even with more powerful generative AI, it still wouldn't be a
general-purpose solution.
Notes.
Why We Should Do More
Spatial features in traditional browser UIs, like Spatial
Browsing:
Limitation: Whether the spatial UI is auto-generated or precisely
built by developers, the end result
conflicts with the existing browser app's Window/Tab UI and can't
coexist.
For example, the address bar and tab bar are tightly coupled to
the browser window's frame and solid-color background.
If we heavily rework the browser UI to suit a borderless,
transparent-background spatial web UI, it
could make browsing traditional 2D pages less efficient.
Even Apple needs users to trigger a
special dedicated mode for Spatial Browsing in current
Safari, just like with WebXR.
Notes.
A Current Technology Bypasses Traditional Browser UI Limits
PWA
Notes.
Why We Should Do More
PWA
Once installed, PWAs can run in
standalone windows without browser UI
such as the address/tab bars, so it's a natural fit for spatial web
UIs.
But uninstalled PWAs can only run as browser tabs, still
constrained by the browser UI.
If a web app with a spatial UI has to be installed first to enable the
spatial experience, it's basically
no different from a native app.
Still, there are losses of
the web's core advantages, like URLs, no-install, on-demand
access.
Doesn't meet AI/MR OS's needs.
Notes.
Big Problem 3 of 3: Web 3D is Still Hard and Lacks Developers for
General Computing Needs
What's Happening?
Notes.
Paradigm shift in XR Development: From 3D containing 2D to
2D containing 3D
Notes.
Traditional 3D development: "3D containing 2D"
Everything defaults to 3D, and the app is built with a
3D engine.
Some parts can still be plain 2D (like a 2D GUI), but those 2D pieces
can't exist or be developed separately. You
can't just plug in mainstream 2D GUI frameworks or component
libraries. Instead, they need to be folded into the 3D development approach,
providing dedicated APIs on top of the 3D engine that imitate typical
2D GUI development.
Unity UGUI / CSS3D in Three.js / Dear ImGui
Notes.
Most regular web developers find it hard to take part in developing
this kind of app.
Most apps only need 3D for a few parts. Making the whole app
with a 3D engine and 3D mindset just for those bits - and relying on
the small pool of 3D developers - is inefficient.
As multimodal AI devices drive apps toward mixed 2D+3D spatial
forms, the
current number of 3D developers and the efficiency of 3D
development can't scale to match.
Notes.
Spatial Development: "2D containing 3D"
Notes.
Spatial Development: "2D containing 3D"
The outermost Shared Space is built with a 3D engine, but it's
implemented and managed by the OS.
All spatial app content lives inside spatial scene containers within
the Shared Space. Inside these containers, everything follows a
2D worldview - the content is
composed of UI components based on a
2D layout system, not 3D graphics or entities rendered in a
3D coordinate space.
Everything defaults to 2D views and is developed using a
2D GUI framework.
All UI components can be laid out and transformed not just along the X
and Y axes but also along the Z axis.
With a 2D GUI development mindset alone, developers can
now create many content/effects that were
once possible only with 3D engines.
Notes.
Some UI components act as 3D content containers. The
containers themselves are still used as 2D views, but inside, they
follow a 3D worldview, rendering 3D entities and assets in a
bounded 3D coordinate space.
3D content containers
bridge the outer 2D world and inner 3D worlds, allowing
2D to include 3D.
Since this paradigm mainly targets 2D GUI developers,
it's best to
blend 3D code into the 2D GUI system as much as possible - to
keep the 2D mindset, lower the learning curve, and give
developers a smoother experience.
3D code no longer runs on its own main loop or controls every
frame. Instead, updates to 3D content are triggered by the
outer 2D GUI system when appropriate.
This kind of 3D code
can't rely on low-level graphics APIs or arbitrary 3D
engines, because under a Unified Rendering architecture, all content,
including content in 3D containers, must be OS-understood and
rendered by the OS's single rendering server.
Notes.
This new development paradigm enables the
broad community of 2D GUI developers to join in building
these kinds of apps, using familiar tools and mindset, and
handling most localized 3D needs
on their own without mastering full 3D engine stacks.
With visionOS, Apple was the first in the industry to define this new
development paradigm
Notes.
building spatial apps mainly with a
2D GUI framework (SwiftUI),
and providing 3D content containers like Model3D View and
RealityView.
Model3D View: uses USD assets for its content
RealityView: use
RealityKit API (3D engine API)
to build dynamic 3D content
RealityKit code can only run through
two hooks - init and update - both of which are
controlled by SwiftUI's declarative framework mechanism.
The RealityKit APIs available in RealityView are
high-level ECS-based APIs. Even for custom material
needs, you have to use high-level APIs like MaterialX, not
hand-written shaders.
Notes.
Notes.
What We Currently Have for 3D Development
<canvas> element
Web 3D Engines (Three.js, R3F, AFrame, …)
WebXR
Notes.
Why Canvas Isn't Enough
The existing <canvas> element in HTML isn't really
part of this new "2D including 3D" paradigm.
The <canvas> element's content is rendered
independently using low-level 3D graphics APIs or an arbitrary Web 3D
engine, so the OS can't render it in a unified way.
When <canvas> content is shown concurrently
with other encompassing 2D HTML elements, the 3D content in a
<canvas> is
always projected onto a flat surface when rendered, so it
can't
display real volumetric 3D content in actual space when it is
"contained within" a 2D context.
Developing content for the <canvas> element isn't
mainly for 2D GUI developers but rather
targets developers who are familiar with 3D development mindset, 3D
graphics APIs, and 3D engines.
Notes.
Why Web 3D Engines Aren't Enough
Web 3D Engines
Existing Web 3D engines are built on the
<canvas> element and low-level 3D graphics APIs.
Inside the <canvas>,
existing Web 3D engines follow a traditional "3D containing 2D"
paradigm.
Like Unity, most Web 3D engines
can't directly support Unified Rendering and are also
hard for regular web developers.
AFrame and R3F are much better since they both try to
keep the declarative UI component style familiar from 2D
development
Notes.
Why WebXR Isn't Enough
WebXR
WebXR follows a traditional "3D containing 2D" paradigm.
In WebXR, 3D content is built on the
<canvas> element and low-level 3D graphics APIs.
Since the <canvas> content for WebXR sessions
doesn't coexist with the surrounding HTML elements that contain
it, it can display true volumetric 3D content. However, it still
has two <canvas> issues: it
doesn't support Unified Rendering, and it's
hard for regular 2D developers to use.
Notes.
2. What Solutions or APIs Are Required to Solve These
Problems
Solution Pillar 1 of 3: A New Dev Paradigm Enabling
Unified Rendering and Developer Friendliness
We need a new XR development system, distinct from WebXR - one that
completely avoids custom rendering, lets the OS understand the
content, fits into the Unified Rendering architecture and Shared Space,
and is designed for regular 2D web developers by building on
the familiar mainstream web ecosystem and mindset. It should continue to
offer ease of use and high efficiency, while
still offering enough spatial and 3D development power.
Solution Pillar 2 of 3
Enabling Spatial UI in the Mainstream HTML/CSS Web Ecosystem
The only solution is to introduce these spatial UI capabilities, which
are currently exclusive to native 2D GUI frameworks, into the
standardized, open, and mainstream HTML/CSS-based Web ecosystem
at the earliest opportunity.
Keep as much of the innovation and progress made in developing spatial
apps as possible within the open ecosystem.
It means that popular 2D web UI frameworks like React, which
are based on HTML and CSS, would also gain spatial UI features.
This also means that
existing websites and web apps built with HTML/CSS should be
able to transition into the new app ecosystem, gaining
enhanced UI capabilities in spatial platforms
without disrupting their current screen-based interfaces and
primary user scenarios
(desktop and mobile).
Modern native 2D GUI frameworks like SwiftUI are actually very similar
to HTML/CSS and web-based frameworks like React. They have comparable
2D UI capabilities and face the same challenges. So when bringing in
spatial UI features, we can
build on what these frameworks have already achieved, reusing their
progress and lessons to avoid trial and error and reduce the cost of
reaching consensus.
Solution Pillar 3 of 3
Blend into the 2D HTML/CSS/DOM APIs with Minimal Additions
A minimal but systematic HTML/CSS/JS API set, similar to
SwiftUI's new APIs in visionOS, is required to extend existing 2D
HTML/CSS/JS APIs specifically for spatial UI requirements. This
extension should
integrate seamlessly with the original 2D APIs, enabling
spatial functionality to be implemented with precision and flexibility
while
preserving existing content and development experiences with minimal
cost and disruption.
Reuse and extend the standalone window experience provided by
the Web App Manifest and PWAs, without sacrificing the
web's advantage in delivering install-free apps.
Build an open-source SDK project called
WebSpatial SDK based on
existing declarative web frameworks and web build tools. This
way, developers can start using WebSpatial APIs right away in
HTML/CSS-style APIs like JSX and CSS-in-JS, without having to wait for
browser engines to support them.
Hybrid technology will be used to implement the
WebSpatial Runtime across different spatial app platforms. On
platforms where the WebSpatial Runtime can be integrated into the
browser, web developers just need to run the site URL as a PWA to enable
spatial UI. For platforms where the browser can't be modified,
developers can add a packaging step to their workflow (similar to using
Electron) to pre-package the site as a PWA, with the WebSpatial Runtime
bundled in.
Already supports React and various frameworks and web build tools in
the React ecosystem
Already supports visionOS
Planned support for Android XR and more.
Depth Layout, Spatial Transform, Material Background, and Spatial
Scene
are already supported.
APIs related to 3D content - like
Spatial Events, Volume, 3D Container Elements, and the 3D Engine
API
- have been merged into the main branch and are currently being
tested.
High priority on
minimizing the integration cost for existing websites
Ready-to-use optimization solutions are provided to ensure the
original UI and main user scenarios (desktop and mobile) remain
unaffected.
A new solution will make it even simpler - one website and one URL
can support both screen-based and spatial devices without
interfering with each other.
Notes.
WebSpatial Builder provides full support for
packaging, simulator-based debugging, real device testing, and
submission to the visionOS App Store, eliminating the need to interact with Xcode or native app shells
throughout the entire process.
Fluid's founder highly praised WebSpatial, believing it can
efficiently transform their Next.js + React smart TV interface
into a 3D spatial media application.
Each of the real apps shown earlier was
built solo in a little over a week of spare time.
Tried out a hackathon event called "Web to Spatial"
95% of developers have no issue following the "Quick Example" to
start a WebSpatial app
Not too many problems with using spatialized 2D HTML APIs.
Many requests on 3D APIs (we hadn’t yet supported APIs like
Spatial Events, <reality>, and <entity> during the
event)
The main blocker was the complex Apple Store submission process,
needing detailed docs.
Notes.