Mapping the Physical Reality of AI

AI gets discussed in terms of capabilities. Benchmarks, parameters, reasoning, alignment. The infrastructure underneath is treated as background detail — engineers worry about it, but it rarely enters the narrative.

observe-ai is the opposite. An interactive globe showing the physical reality: where the servers are, which fab made the chip, who financed the facility, which government has jurisdiction, what the supply chain looks like end to end. The abstraction ends at the map.

What it shows

Fourteen layers on a deck.gl GlobeView, browsable on a timeline from 2005 to 2026.

Compute infrastructure: Google Cloud, AWS, and Azure data center locations. AI facilities: labs, research hubs, training clusters. Semiconductor fabrication plants, sourced from Wikipedia, the CHIPS Act filings, and SIA data. Regulatory zones from OECD.AI and Stanford HAI. Export control listings from the U.S. Consolidated Screening List.

Then the connective tissue. Supply arcs trace fab-to-customer relationships — click a fab and see which companies buy from it, rendered as great-circle lines. Trade flows come from UN Comtrade HS 8542 bilateral data (chip-specific). Money flows from Stanford HAI’s AI Index. Patents from PatentsView, co-authorship networks from OpenAlex institution pairs. ESG data prorated from Google, Microsoft, AWS, Meta, and TSMC sustainability reports. AI job postings by metro from Lightcast, Stanford HAI, and BLS OEWS.

Each layer is a different lens on the same system. What you learn comes from having them together.

The correlation engine

The central interaction in Phase 5: click any entity and the globe dims the unrelated world.

Every first-order neighbor lights up. Click TSMC Fab 18: supply arcs to its customers appear, the regulatory zone that governs it brightens, investment flows anchor to it, patents in the cluster surface, job postings in the metro activate, ESG data shows in the panel. A pulsing anchor ring marks the selected coordinate even when its own layer is off.

The detail panel shows a Related section grouped by relation kind. Each row is a click-through that reselects the neighbor, cascading through URL state and back through the correlation engine. You walk the graph one hop at a time.

The implementation is a pure graph traversal in src/utils/correlate.ts — O(N) per click across the ~1-2K total features currently loaded. No graph database, no precomputed adjacency index. Fast enough at this scale, and extensible if that ever changes.

The data pipeline

Static SPA — no backend, no API container. All data bundled at build time, with DuckDB-WASM handling browser-side Parquet queries as an opt-in path.

The pipeline is Python 3.12, run on a weekly GitHub Actions cron. It fetches from upstream sources, geocodes, dedupes, validates against a schema, and exports to GeoJSON and Parquet in public/data/. Every feature carries provenance metadata:

"provenance": {
  "sources": ["https://sustainability.google/reports/…"],
  "updated": "2026-04-19",
  "confidence": 0.65
}

Confidence 1.0 is curated or manually verified. 0.7-0.95 is live-scraped. Below 0.7 is best-effort derivation — the ESG layer sits at 0.65 because hyperscalers publish fleet totals that we prorate across individual sites. That number is visible in the detail panel, not hidden.

The weekly run opens a pull request for human review before anything merges. Data changes are inspectable before they go live.

Phase 5 extras

Compare mode splits the viewport into two independent globes, each with its own full URL state (cmp_lng, cmp_lat, cmp_z, cmp_layers, cmp_sel, cmp_t0, cmp_t1). A deep link encodes both viewports, so you can share a specific side-by-side comparison.

Embeddable mini-globes: ?embed=1 strips the sidebar, scrubber, and footer. ?focus=<feature_id> recenters the camera and auto-selects the entity. The intent is per-post embeds in the blog — drop an iframe that opens on the exact facility or regulatory zone the post is discussing.

Phase 6 in the backlog: facility-level energy trajectories over time (once the ESG layer carries multiple years), patent-to-fab IP linkage via assignee matching, and a sankey-style flow cross-section orthogonal to the globe.

What you actually learn

The standard AI narrative moves fast and stays abstract. Models improve, companies grow, the future arrives or doesn’t.

The map slows that down. TSMC makes chips for almost everyone. A handful of U.S. export control listings are load-bearing for the whole global supply chain. The data center footprint of three companies covers most of the planet’s AI compute. Most of the money comes from a small geography.

None of this is secret. It’s just easier to ignore when it isn’t in front of you.

observe-ai.samantafluture.com

What it shows

The correlation engine

The data pipeline

Phase 5 extras

What you actually learn

Want to follow along?