<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Apache SkyWalking – Cloud Native</title>
    <link>/tags/cloud-native/</link>
    <description>Recent content in Cloud Native on Apache SkyWalking</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Tue, 23 Jun 2026 00:00:00 +0000</lastBuildDate>
    
	  <atom:link href="/tags/cloud-native/feed.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Blog: Meet Horizon UI · 7/16: The Log Explorer</title>
      <link>/blog/2026-06-23-horizon-ui-log-explorer/</link>
      <pubDate>Tue, 23 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-23-horizon-ui-log-explorer/</guid>
      <description>
        
        
        &lt;p&gt;This is the seventh post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Meet Horizon UI&lt;/a&gt; series. &lt;a href=&#34;/blog/2026-06-22-horizon-ui-trace-explorer/&#34;&gt;Part 6&lt;/a&gt; was one request&amp;rsquo;s spans; this one is the log lines around it. Horizon surfaces logs through &lt;strong&gt;two distinct tabs&lt;/strong&gt;, because there are really two different questions: &lt;em&gt;&amp;ldquo;what did this service log over the last half hour?&amp;rdquo;&lt;/em&gt; and &lt;em&gt;&amp;ldquo;what is this pod printing to stdout right now?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Logs&lt;/strong&gt; tab queries the logs SkyWalking has already &lt;strong&gt;collected and stored&lt;/strong&gt; — indexed, filterable, correlated with traces.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Pod Logs&lt;/strong&gt; tab &lt;strong&gt;live-tails&lt;/strong&gt; a Kubernetes pod&amp;rsquo;s container logs &lt;em&gt;on demand&lt;/em&gt; — these aren&amp;rsquo;t stored logs at all: OAP reads them straight from the &lt;strong&gt;Kubernetes API server&lt;/strong&gt; (the &lt;code&gt;kubectl logs&lt;/code&gt; path), Horizon shows the window, and it&amp;rsquo;s discarded. Nothing is persisted, and SkyWalking&amp;rsquo;s log storage is never involved.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Which tabs a layer shows is up to its template: the &lt;strong&gt;Logs&lt;/strong&gt; tab appears on layers that enable it (General, Mesh, Nginx, the Envoy AI Gateway, the mobile and mini-program layers); the &lt;strong&gt;Pod Logs&lt;/strong&gt; tab appears only on the Kubernetes-aware layers (Kubernetes Service, Mesh, Mesh data plane). Browser JavaScript errors are a different thing again — not service logs but client-side error events the browser agent reports, with their own categories and their own &lt;strong&gt;source-map de-obfuscation&lt;/strong&gt; (turning a minified &lt;code&gt;app.min.js:1:…&lt;/code&gt; frame back into your original &lt;code&gt;file:line&lt;/code&gt;). That&amp;rsquo;s a separate tab on the Browser layer, and its own post later in the series.&lt;/p&gt;
&lt;h2 id=&#34;the-stored-log-stream&#34;&gt;The stored log stream&lt;/h2&gt;
&lt;p&gt;Open a layer that has a &lt;strong&gt;Logs&lt;/strong&gt; tab, pick a service in the header, and its stored log stream loads newest-first. Like the trace explorer, this tab &lt;strong&gt;owns its own time range&lt;/strong&gt; — the global topbar picker is paused while you&amp;rsquo;re here, so auto-refresh can&amp;rsquo;t shift the window out from under an investigation. Pick a rolling preset (last 15 minutes through 24 hours, default 30) or a custom absolute window; queries run at &lt;strong&gt;second precision&lt;/strong&gt; so the most recent lines are never rounded off the minute.&lt;/p&gt;
&lt;p&gt;A conditions bar narrows the stream, and every filter is optional and AND-joined:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Instance&lt;/strong&gt; — restrict to one service instance (labelled &lt;strong&gt;Sidecar&lt;/strong&gt; on a sidecar layer).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Endpoint&lt;/strong&gt; — type to search the service&amp;rsquo;s endpoints, click to pin, &lt;strong&gt;×&lt;/strong&gt; to clear.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace ID&lt;/strong&gt; — show only the lines correlated with one trace. This is also how a log lands when you arrive &lt;em&gt;from&lt;/em&gt; a trace: the field pre-fills and the stream is already scoped.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tags&lt;/strong&gt; — a single &lt;code&gt;key=value&lt;/code&gt; field with autocomplete; start a key to see suggestions, type &lt;code&gt;=&lt;/code&gt; to switch to known values, Enter to commit. Committed tags ride along as removable chips.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Level&lt;/strong&gt; — the &lt;strong&gt;Levels&lt;/strong&gt; strip above the stream doubles as a filter: click &lt;code&gt;error&lt;/code&gt;, &lt;code&gt;warn&lt;/code&gt;, &lt;code&gt;info&lt;/code&gt;, or &lt;code&gt;debug&lt;/code&gt; to keep only that level, click again to clear.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s no log query language here — no LogQL box to learn. The conditions above are the whole surface, and &lt;strong&gt;edits refresh the stream as you make them&lt;/strong&gt;; &lt;strong&gt;Run query&lt;/strong&gt; is just the explicit &amp;ldquo;I&amp;rsquo;m done editing, refresh now&amp;rdquo; button that resets to the first page.&lt;/p&gt;
&lt;h2 id=&#34;reading-the-stream&#34;&gt;Reading the stream&lt;/h2&gt;
&lt;p&gt;The point of a log view isn&amp;rsquo;t to &lt;em&gt;list&lt;/em&gt; lines, it&amp;rsquo;s to find the shape in them — so the stream comes with two pieces of orientation above it.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;density histogram&lt;/strong&gt; plots log count over time, each bar &lt;strong&gt;stacked by level&lt;/strong&gt; in the legend&amp;rsquo;s colors; hover a bar for that bucket&amp;rsquo;s time range and per-level counts. It&amp;rsquo;s drawn from the page currently on screen, so it shows the shape of what you&amp;rsquo;re looking at. And the &lt;strong&gt;Levels&lt;/strong&gt; strip carries a running count per level — sampled across the window, not just the visible page, so the error/warn/info mix reflects the whole window you&amp;rsquo;re querying.&lt;/p&gt;
&lt;p&gt;Each row then shows the timestamp, the level (the row is color-keyed to it), the service, an &lt;strong&gt;↗ trace&lt;/strong&gt; link when the line is trace-correlated, a &lt;strong&gt;&lt;code&gt;JSON&lt;/code&gt; / &lt;code&gt;YAML&lt;/code&gt; / &lt;code&gt;TEXT&lt;/code&gt; format chip&lt;/strong&gt;, and a one-line preview of the content. Horizon decides that chip by what the payload actually &lt;em&gt;is&lt;/em&gt;: OAP labels a body JSON or plain text, and on top of that Horizon sniffs for JSON and YAML structure, so an unlabelled-but-structured line still gets the right treatment — JSON flattened to one line in the preview, YAML keeping its keys, plain text whitespace-collapsed.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p07-logs-01-stream.webp&#34; alt=&#34;Figure 1: The stored Logs stream — the conditions bar, the level-stacked density histogram and Levels strip above, and the log rows below, each with its level color, service, format chip, and an ↗ trace link.&#34;&gt;
Figure 1: A service&amp;rsquo;s stored log stream — a level histogram and level counts over the window, then the rows, each tagged JSON / YAML / TEXT and linked to its trace.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;into-a-single-line&#34;&gt;Into a single line&lt;/h2&gt;
&lt;p&gt;Click a row and the full payload opens in a popout: the complete content with &lt;strong&gt;format-aware pretty-printing&lt;/strong&gt; — JSON and YAML laid out properly, plain text given the whole canvas instead of a cramped strip — plus a &lt;strong&gt;Copy&lt;/strong&gt; button, the service / instance / endpoint / trace context, and a table of every tag on the line. When the line is trace-correlated, an &lt;strong&gt;↗ trace&lt;/strong&gt; button opens the related &lt;a href=&#34;/blog/2026-06-22-horizon-ui-trace-explorer/&#34;&gt;trace&amp;rsquo;s waterfall&lt;/a&gt; in an overlay without leaving the stream — and it passes the row&amp;rsquo;s timestamp along, so the trace is found even when it has aged into a colder storage tier. Escape or a backdrop click closes it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p07-logs-02-payload.webp&#34; alt=&#34;Figure 2: A log line&amp;rsquo;s payload popout — the full content pretty-printed by format (a JSON access log here), a Copy button, the service and instance context, and the line&amp;rsquo;s tag table (here status.code).&#34;&gt;
Figure 2: One line in full — its payload pretty-printed by format, with its context and every tag laid out beside it.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;pod-logs-tailing-whats-printing-right-now&#34;&gt;Pod Logs: tailing what&amp;rsquo;s printing right now&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;Pod Logs&lt;/strong&gt; tab answers the other question, and it&amp;rsquo;s a fundamentally different source: not SkyWalking&amp;rsquo;s stored logs, but the pod&amp;rsquo;s container output read live from the &lt;strong&gt;Kubernetes API server&lt;/strong&gt; through OAP — the exact thing &lt;code&gt;kubectl logs -f&lt;/code&gt; reads. There&amp;rsquo;s no stored history to page through; each refresh pulls the trailing window, shows it, and throws it away.&lt;/p&gt;
&lt;p&gt;Starting a tail is a few picks: choose a &lt;strong&gt;Pod&lt;/strong&gt; (one service instance, pinned), a &lt;strong&gt;Container&lt;/strong&gt; (Horizon lists the pod&amp;rsquo;s containers and selects the first), a look-back &lt;strong&gt;Window&lt;/strong&gt; (last 30s, 1m, 5m, 15m, or 30m — how far back each poll reaches), and a poll &lt;strong&gt;Interval&lt;/strong&gt; (2s, 5s, 10s, or 30s — how often it re-fetches). Press &lt;strong&gt;Start&lt;/strong&gt; and the window streams into a read-only viewer that keeps the newest line in view and re-polls until you &lt;strong&gt;Pause&lt;/strong&gt;; a header strip shows the container, the line count, a live indicator, and how long ago it last updated. Two &lt;strong&gt;Include&lt;/strong&gt; / &lt;strong&gt;Exclude&lt;/strong&gt; filter rows narrow what you see — each chip is a &lt;strong&gt;full-line regular expression&lt;/strong&gt; (&lt;code&gt;.*error.*&lt;/code&gt;) evaluated by OAP, so they match the whole line rather than a substring, and they stack.&lt;/p&gt;
&lt;p&gt;One thing to know going in: on-demand pod logs are &lt;strong&gt;disabled by default on OAP&lt;/strong&gt;, because container output can carry secrets. When the feature is off — or when the pod you picked has been rolled or scaled away — OAP returns a &lt;em&gt;reason&lt;/em&gt;, and Horizon shows it in a banner rather than an empty pane, so you can tell &amp;ldquo;turn this on&amp;rdquo; apart from &amp;ldquo;that pod is gone.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p07-logs-03-pod-logs.webp&#34; alt=&#34;Figure 3: The Pod Logs tab — the pod and container pickers, the look-back window and poll interval, the live-indicator header strip, the Include/Exclude regex chips, and the read-only tail pane streaming the container&amp;rsquo;s recent output.&#34;&gt;
Figure 3: A live tail of one pod&amp;rsquo;s container — windowed, interval-polled, regex-filterable, never persisted.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;where-to-go-next&#34;&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;Both tabs — the stored queries, the tag and container autocomplete, and the live tail — are gated by a single &lt;code&gt;logs:read&lt;/code&gt; permission, so granting &amp;ldquo;can read logs&amp;rdquo; is one switch. For the field reference — every condition, the histogram, the Pod Logs windows and filters — see the &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/operate/logs/&#34;&gt;Logs docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;Browser &amp;amp; RUM monitoring&lt;/strong&gt; — the browser agent&amp;rsquo;s own error stream, and de-obfuscating a minified stack with source maps.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 5/16: The 3D Infrastructure Map</title>
      <link>/blog/2026-06-22-horizon-ui-3d-infrastructure-map/</link>
      <pubDate>Mon, 22 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-22-horizon-ui-3d-infrastructure-map/</guid>
      <description>
        
        
        &lt;p&gt;This is the fifth post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Meet Horizon UI&lt;/a&gt; series. &lt;a href=&#34;/blog/2026-06-21-horizon-ui-topology-and-dependency/&#34;&gt;Part 3&lt;/a&gt; drew the map &lt;em&gt;between&lt;/em&gt; services and &lt;a href=&#34;/blog/2026-06-21-horizon-ui-deployment-and-banyandb/&#34;&gt;Part 4&lt;/a&gt; drew it &lt;em&gt;inside&lt;/em&gt; one service. This post zooms all the way out: a single &lt;strong&gt;WebGL&lt;/strong&gt; view of your &lt;em&gt;entire&lt;/em&gt; deployment at once — every SkyWalking layer&amp;rsquo;s services rendered as cubes, stacked in 3D, with live traffic, alarms, and the calls between them. It&amp;rsquo;s the &amp;ldquo;stand back and look at everything&amp;rdquo; companion to the per-layer dashboards.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also genuinely interactive, so rather than describe it cold, here it is — the real map running on the demo&amp;rsquo;s sample data. Drag to rotate, scroll to zoom, click a cube:&lt;/p&gt;
&lt;figure class=&#34;map3d-stage&#34;&gt;
  &lt;a class=&#34;map3d-poster&#34; href=&#34;/3d-map-app/&#34; aria-label=&#34;Open the interactive 3D infrastructure map&#34;&gt;&lt;span class=&#34;map3d-badge&#34;&gt;Interactive · sample data&lt;/span&gt;
    &lt;img loading=&#34;lazy&#34; src=&#34;/images/home/horizon-3d-map.png&#34;
         alt=&#34;Apache SkyWalking topology rendered as an interactive 3D scene you can orbit, zoom and click&#34;&gt;
    &lt;span class=&#34;map3d-play&#34; aria-hidden=&#34;true&#34;&gt;
      &lt;span class=&#34;map3d-play-btn&#34;&gt;
        &lt;svg viewBox=&#34;0 0 24 24&#34; fill=&#34;none&#34;&gt;&lt;path d=&#34;M8 5v14l11-7-11-7z&#34; fill=&#34;currentColor&#34;/&gt;&lt;/svg&gt;
      &lt;/span&gt;
      &lt;span class=&#34;map3d-play-label&#34;&gt;Open the 3D map&lt;/span&gt;
    &lt;/span&gt;
  &lt;/a&gt;
&lt;/figure&gt;


&lt;h2 id=&#34;one-scene-for-the-whole-estate&#34;&gt;One scene for the whole estate&lt;/h2&gt;
&lt;p&gt;The 3D map is a standalone, full-screen view at &lt;code&gt;/3d/map&lt;/code&gt;, opened from the &lt;strong&gt;3D Infra&lt;/strong&gt; pill in the topbar. It deliberately drops the rest of the console — no sidebar, no topbar, no global time picker — so the scene gets the whole viewport. The SkyWalking mark sits bottom-left; the &lt;code&gt;×&lt;/code&gt; top-right returns you to Horizon. Everything in it is read live from the &lt;em&gt;same&lt;/em&gt; OAP the rest of Horizon talks to: the service roster, each layer&amp;rsquo;s topology, the per-service traffic, and the active alarms.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p05-3dmap-01-overview.webp&#34; alt=&#34;Figure 1: The 3D Infrastructure Map — every layer&amp;rsquo;s services as cubes, grouped into brand-colored per-layer zones and stacked onto horizontal tiers, with a tier/layer panel on the right.&#34;&gt;
Figure 1: The whole deployment in one scene — services as cubes, layers as colored zones, roles as stacked tiers.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;tiers-are-the-spine&#34;&gt;Tiers are the spine&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;tier&lt;/strong&gt; is a horizontal plane that groups related layers by their role in the system. Tiers read top-to-bottom the way a request flows — from the apps a user touches down to the platform everything runs on. Horizon ships four bundled tiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Apps&lt;/strong&gt; (top) — application surfaces and the dependencies as the app sees them: General agent services, Browser/RUM, mobile, and the &lt;code&gt;Virtual*&lt;/code&gt; targets (database, cache, MQ, gateway, GenAI).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Middleware&lt;/strong&gt; — the data and messaging services, gateways, and self-observability: MySQL, PostgreSQL, Redis, Kafka, RocketMQ, APISIX, Nginx, the SkyWalking SO11Y components, and cloud-managed data services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service Mesh&lt;/strong&gt; — the mesh that fronts the apps: Istio control/data plane, Cilium, the Envoy AI Gateway.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Infra&lt;/strong&gt; (bottom) — the platform the rest runs on: Kubernetes, hosts, VMs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every layer OAP reports lands on exactly one tier. A layer Horizon hasn&amp;rsquo;t classified yet — a brand-new OAP layer, say — falls to the &lt;strong&gt;failover tier&lt;/strong&gt; (Middleware by default) with an &amp;ldquo;unclassified&amp;rdquo; mark, so it shows up and an operator can re-assign it rather than silently dropping off the map. The panel on the right mirrors the stack: click a tier row to fly the camera to it, use the eye toggle to show or hide a whole tier (or a single layer) at once, and read off how many of its services are currently visible.&lt;/p&gt;
&lt;h2 id=&#34;reading-the-map-cubes-zones-traffic&#34;&gt;Reading the map: cubes, zones, traffic&lt;/h2&gt;
&lt;p&gt;Each &lt;strong&gt;cube is one service&lt;/strong&gt;. Cubes are grouped into their layer&amp;rsquo;s &lt;strong&gt;zone&lt;/strong&gt; on the tier — a translucent rectangle painted in the layer&amp;rsquo;s brand color and stamped with the project&amp;rsquo;s logo (Istio&amp;rsquo;s sail, the Kubernetes helm, a database cylinder, a queue) so you can pick a zone out from any camera angle. Layers that ship a topology (General, Service Mesh, Kubernetes Service, Cilium) lay their cubes out &lt;strong&gt;by call dependency&lt;/strong&gt; — upstream callers on one side, downstream services on the other, the 3D analogue of the 2D service map. Layers without a topology pack their cubes into a tidy grid.&lt;/p&gt;
&lt;p&gt;A small &lt;strong&gt;traffic pill&lt;/strong&gt; under a cube shows that service&amp;rsquo;s live headline throughput — requests per minute for app and mesh services, queries or operations per second for data services, each with its own unit. The pills appear on cubes close enough to read and fade away as you zoom out to keep the scene clean, then return as you come back in; a selected cube always shows its number.&lt;/p&gt;
&lt;h2 id=&#34;alarms-and-beacon-mode-for-incidents&#34;&gt;Alarms, and Beacon mode for incidents&lt;/h2&gt;
&lt;p&gt;When a service has a &lt;strong&gt;currently-firing alarm&lt;/strong&gt; (Horizon polls the last 20 minutes and counts only service-scoped, still-firing ones), its cube &lt;strong&gt;pulses red&lt;/strong&gt; — a beacon you can spot from clear across the scene, while the alarm feed refreshes on its own.&lt;/p&gt;
&lt;p&gt;On a busy map, one more red cube among hundreds can still be hard to find — so there&amp;rsquo;s &lt;strong&gt;Beacon mode&lt;/strong&gt;. Toggle it from the toolbar and every &lt;em&gt;healthy&lt;/em&gt; cube dims to a dark wireframe ghost, leaving only the alarming services lit and glowing. The shape of the deployment stays legible, but the services that are actually on fire are the only thing with color. It turns the bird&amp;rsquo;s-eye view into an incident triage surface in one click.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p05-3dmap-02-beacon-mode.webp&#34; alt=&#34;Figure 2: Beacon mode — every healthy cube dimmed to a wireframe ghost so only the services with a firing alarm stay lit and glowing red.&#34;&gt;
Figure 2: Beacon mode dims everything healthy to a ghost, so the firing services are the only thing you see.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-lines-between-things&#34;&gt;The lines between things&lt;/h2&gt;
&lt;p&gt;The map draws the call graph, not just the nodes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;In-layer calls&lt;/strong&gt; — light cyan tubes between two services in the same layer, with packets animating along them. This is each layer&amp;rsquo;s internal call graph, always on.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-layer calls&lt;/strong&gt; — soft amber arrows between services in &lt;em&gt;different&lt;/em&gt; layers on the same tier (a Browser app calling a Frontend, a Frontend calling a Virtual Database), pointing from caller to callee.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hierarchy links&lt;/strong&gt; — and here&amp;rsquo;s the one that makes the 3D layout earn its keep. Select a cube and thicker gray tubes connect the &lt;strong&gt;different faces of the same logical service&lt;/strong&gt; across tiers — the service as its agent sees it, as the mesh sees it, as a Kubernetes service. These represent &lt;em&gt;identity&lt;/em&gt;, not traffic, so they stay hidden until you select a cube and then show just that cube&amp;rsquo;s relatives, climbing the stack from tier to tier. It&amp;rsquo;s the &lt;a href=&#34;/blog/2026-06-21-horizon-ui-topology-and-dependency/&#34;&gt;Smartscape&lt;/a&gt; idea from Part 3, drawn in the dimension it was always meant for.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p05-3dmap-03-selected-hierarchy.webp&#34; alt=&#34;Figure 3: A selected service — its detail card beside the cube, and its cross-tier hierarchy links lit up, connecting the same logical service across the Apps, Service Mesh, and Infra tiers.&#34;&gt;
Figure 3: Select a cube and its identity links climb the tiers — one service, seen by its agent, the mesh, and Kubernetes.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;moving-around&#34;&gt;Moving around&lt;/h2&gt;
&lt;p&gt;Drag to rotate, scroll to zoom, and &lt;strong&gt;arrow keys&lt;/strong&gt; or &lt;strong&gt;WASD&lt;/strong&gt; pan the view (hold &lt;strong&gt;Shift&lt;/strong&gt; for a bigger step); a top-left toolbar offers the same gestures as buttons for trackpads. There&amp;rsquo;s one deliberate rule worth knowing: &lt;strong&gt;clicks inside the 3D scene never move the camera&lt;/strong&gt; — they only select. Click a cube and it highlights, a detail card appears beside it (the service&amp;rsquo;s name, its tier and layer, and an &lt;strong&gt;Open dashboard&lt;/strong&gt; button that jumps into that service&amp;rsquo;s layer dashboard in a new tab), and its hierarchy links light up. The camera-move surface is the &lt;em&gt;side panel&lt;/em&gt; and the &lt;em&gt;toolbar&lt;/em&gt; — click a layer row to glide the camera to its zone. Keeping those two jobs separate is what makes selecting a small cube feel reliable instead of having the cube slide out from under your cursor.&lt;/p&gt;
&lt;h2 id=&#34;how-it-builds&#34;&gt;How it builds&lt;/h2&gt;
&lt;p&gt;A whole deployment is too much to fetch in one request, so the map loads in stages, and a slim &lt;strong&gt;timeline strip&lt;/strong&gt; along the bottom shows the progress live: &lt;strong&gt;Services&lt;/strong&gt; (the roster and their layers) → &lt;strong&gt;Templates&lt;/strong&gt; (which layers carry a topology) → &lt;strong&gt;Topologies&lt;/strong&gt; (each topology-bearing layer&amp;rsquo;s call graph) → &lt;strong&gt;Hierarchy&lt;/strong&gt; (the cross-tier identity links) → &lt;strong&gt;Layout&lt;/strong&gt; (placing the cubes) → &lt;strong&gt;Metrics&lt;/strong&gt; (the per-service traffic, fetched in batches so the cubes light up progressively). Click any step for a drawer with its detail, or hit refresh to re-run the whole sequence.&lt;/p&gt;
&lt;p&gt;Two touches make refreshes cheap. The &lt;strong&gt;hierarchy&lt;/strong&gt; step is incremental — only services that are &lt;em&gt;new&lt;/em&gt; since the last run get probed, the rest reused from cache, so a steady deployment costs nothing there. And the scene is re-keyed on a per-layer structure hash, so an unchanged refresh keeps your camera exactly where it was; only a real roster or edge change rebuilds the layout. Under the hood it&amp;rsquo;s &lt;a href=&#34;https://threejs.org/&#34;&gt;Three.js&lt;/a&gt; with a thin Vue wrapper, every geometry and material shared across cubes of the same kind — the kind of detail that keeps a few hundred services rendering smoothly in a browser tab.&lt;/p&gt;
&lt;h2 id=&#34;configured-not-coded&#34;&gt;Configured, not coded&lt;/h2&gt;
&lt;p&gt;None of the above is a hard-coded &amp;ldquo;3D screen.&amp;rdquo; What the map shows is driven by a single configuration an administrator edits in a &lt;strong&gt;structured form editor&lt;/strong&gt; at &lt;code&gt;/admin/3d-map&lt;/code&gt; — tiers, layers, colors, and metrics through form controls, not raw JSON. From it you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Filter layers&lt;/strong&gt; with one global regex — anything it excludes drops off the map entirely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arrange tiers&lt;/strong&gt; — rename them, reorder them top-to-bottom, and pin each layer to a tier (with a nominated failover tier so nothing falls off silently).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Group layers&lt;/strong&gt; — cluster several related layers (the SkyWalking self-observability components, say) into one labelled block, each member keeping its own color.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Color each layer&lt;/strong&gt; and &lt;strong&gt;choose its traffic metric&lt;/strong&gt; — the MQE expression, a label, and a unit, seeded by default from that layer&amp;rsquo;s dashboard template so most layers show a sensible number out of the box.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Horizon ships a bundled default, so the map is useful immediately; your edits live as a local draft until you &lt;strong&gt;Check diff &amp;amp; push&lt;/strong&gt; them to OAP — the same draft → preview → publish model behind every dashboard, and the same Export/Import for backup or moving a configuration between deployments. The map itself is a read-only &lt;strong&gt;observe&lt;/strong&gt; surface that runs against your current OAP; publishing the config that shapes it is part of the config-driven customization story a later post in this series covers end to end.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p05-3dmap-04-config-editor.webp&#34; alt=&#34;Figure 4: The 3D-map config — a structured editor at /admin/3d-map for tiers, per-layer color and traffic metric, layer groups, and the global filter, with the usual draft → Check diff &amp;amp; push publish flow.&#34;&gt;
Figure 4: The map is configuration, not code — tiers, colors, and per-layer traffic metrics edited as a form, then published to OAP.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;where-to-go-next&#34;&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;The 3D map is the bird&amp;rsquo;s-eye summary; the 2D per-layer pages stay the authoritative service maps. Viewing it needs only read access (&lt;code&gt;infra-3d:read&lt;/code&gt;, held by the built-in viewer role and up); shaping it needs the same write permission as the dashboards. For the field reference — tiers, the config shape, the loading stages — see the &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/operate/infra-3d-map/&#34;&gt;3D Infrastructure Map docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;the Trace Explorer&lt;/strong&gt; — from the bird&amp;rsquo;s-eye view of the whole deployment back down to a single request, drawn three different ways.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 6/16: The Trace Explorer</title>
      <link>/blog/2026-06-22-horizon-ui-trace-explorer/</link>
      <pubDate>Mon, 22 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-22-horizon-ui-trace-explorer/</guid>
      <description>
        
        
        &lt;p&gt;This is the sixth post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Meet Horizon UI&lt;/a&gt; series. The last few were maps — &lt;a href=&#34;/blog/2026-06-21-horizon-ui-topology-and-dependency/&#34;&gt;topology&lt;/a&gt; between services, the &lt;a href=&#34;/blog/2026-06-21-horizon-ui-deployment-and-banyandb/&#34;&gt;deployment&lt;/a&gt; inside one, the &lt;a href=&#34;/blog/2026-06-22-horizon-ui-3d-infrastructure-map/&#34;&gt;3D view&lt;/a&gt; of the whole estate. Those answer &amp;ldquo;what does my system look like.&amp;rdquo; This post is about the opposite move: from an aggregate down to &lt;em&gt;one request&lt;/em&gt; — its spans, its timing, the exact hop that went slow. That&amp;rsquo;s the &lt;strong&gt;Traces&lt;/strong&gt; tab.&lt;/p&gt;
&lt;h2 id=&#34;built-for-triage-not-tailing&#34;&gt;Built for triage, not tailing&lt;/h2&gt;
&lt;p&gt;The Traces tab is a distributed-trace explorer that lives &lt;strong&gt;inside a layer&lt;/strong&gt;: pick a service, set your conditions, and read a single trace&amp;rsquo;s span timeline. And it behaves differently from the rest of the console on purpose — because traces are triage data, not a live feed.&lt;/p&gt;
&lt;p&gt;It &lt;strong&gt;owns its own time range and conditions&lt;/strong&gt;. It does &lt;em&gt;not&lt;/em&gt; follow the global topbar time picker, and it does &lt;em&gt;not&lt;/em&gt; auto-refresh. You stage what you&amp;rsquo;re looking for and press &lt;strong&gt;Run query&lt;/strong&gt;; nothing is fetched until you do. Before the first run the list simply says &lt;em&gt;&amp;ldquo;Pick your conditions, then click Run query.&amp;rdquo;&lt;/em&gt; When you&amp;rsquo;re chasing one bad trace from twenty minutes ago, the last thing you want is the page sliding forward under you every few seconds — so it doesn&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id=&#34;conditions-not-a-query-language&#34;&gt;Conditions, not a query language&lt;/h2&gt;
&lt;p&gt;The whole filter surface is structured form controls — selects, number ranges, and tag chips — staged in a toolbar that only takes effect on &lt;strong&gt;Run query&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Instance&lt;/strong&gt; and &lt;strong&gt;Endpoint&lt;/strong&gt; narrow within the service (the endpoint dropdown lists the service&amp;rsquo;s own endpoints).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Status&lt;/strong&gt; — &lt;code&gt;ALL&lt;/code&gt; / &lt;code&gt;SUCCESS&lt;/code&gt; / &lt;code&gt;ERROR&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Order&lt;/strong&gt; — Newest (by start time) or Slowest (by duration).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limit&lt;/strong&gt; — how many rows to pull (30 by default); the BFF caps the page size server-side so a client can&amp;rsquo;t ask OAP for the world.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time range&lt;/strong&gt; — a rolling preset (last 15 minutes through 24 hours) or a custom absolute window, evaluated at &lt;strong&gt;second precision&lt;/strong&gt; so a trace that &lt;em&gt;just&lt;/em&gt; finished still falls inside it instead of being rounded off the minute.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace ID&lt;/strong&gt; — paste one to look it up directly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Duration range&lt;/strong&gt; — a min–max in milliseconds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tag&lt;/strong&gt; — free-form span tags as &lt;code&gt;key=value&lt;/code&gt; (e.g. &lt;code&gt;http.status_code=500&lt;/code&gt;), added with Enter as removable chips and AND-joined; keys and values get typeahead from the backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One thing this is &lt;em&gt;not&lt;/em&gt;: a query language. There&amp;rsquo;s no TraceQL box anywhere in Horizon — the structured conditions above are the entire surface. (TraceQL is a separate path: SkyWalking&amp;rsquo;s backend can &lt;em&gt;serve&lt;/em&gt; traces to Grafana over TraceQL, which is &lt;a href=&#34;/blog/2026-04-08-traceql/&#34;&gt;its own story&lt;/a&gt;. Horizon&amp;rsquo;s explorer is forms, not a DSL.)&lt;/p&gt;
&lt;h2 id=&#34;a-distribution-you-can-box-select&#34;&gt;A distribution you can box-select&lt;/h2&gt;
&lt;p&gt;When the results come back, the toolbar is joined by a &lt;strong&gt;Distribution&lt;/strong&gt; chart: one dot per trace, plotted with start time on the X axis and duration as its height — slower traces sit higher. Dots are colored by &lt;strong&gt;status&lt;/strong&gt; — errors in red, successes in the accent color — so a cluster of red high up is exactly the &amp;ldquo;slow &lt;em&gt;and&lt;/em&gt; failing&amp;rdquo; corner you came to find.&lt;/p&gt;
&lt;p&gt;The chart is also a filter. Click a dot to pick it, or &lt;strong&gt;drag a box across a band&lt;/strong&gt; of them, and the result list narrows to just that selection — the header switches to an &lt;em&gt;&amp;ldquo;N picked&amp;rdquo;&lt;/em&gt; count with a Reset. This is a client-side filter over what&amp;rsquo;s already loaded; it never issues a new query. Dragging a band across the slow-and-erroring corner and reading only those rows is the fastest way to go from &amp;ldquo;200 traces&amp;rdquo; to &amp;ldquo;these six.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p06-traces-01-explorer.webp&#34; alt=&#34;Figure 1: The Traces explorer — the staged condition toolbar, the Distribution chart (one dot per trace, height = duration, red = error) with a band box-selected (8 picked), and the result list below.&#34;&gt;
Figure 1: Stage conditions, run the query, then box-select a band of the distribution to whittle the list down to the traces worth opening.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;Each result row shows the trace&amp;rsquo;s root endpoint, an OK/ERR flag, the duration, and a bar sized against the slowest trace in the set. What a row &lt;em&gt;represents&lt;/em&gt; depends on the storage backend, and Horizon detects it for you: a banner reads either &lt;em&gt;&amp;ldquo;Full traces are returned inline&amp;rdquo;&lt;/em&gt; (the backend returns whole traces with their spans, so a click opens immediately) or &lt;em&gt;&amp;ldquo;Each row is a trace segment — click one to fetch its full trace.&amp;rdquo;&lt;/em&gt; You never configure this; the banner just tells you which you&amp;rsquo;re looking at.&lt;/p&gt;
&lt;h2 id=&#34;three-ways-to-read-one-trace&#34;&gt;Three ways to read one trace&lt;/h2&gt;
&lt;p&gt;Click a row and the trace opens with a three-way view toggle — &lt;strong&gt;Default&lt;/strong&gt;, &lt;strong&gt;Tree&lt;/strong&gt;, and &lt;strong&gt;Statistics&lt;/strong&gt; — over the same spans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Default&lt;/strong&gt; is the span waterfall: one indented row per span, each carrying a service-colored bar positioned and sized by the span&amp;rsquo;s start offset and duration on a shared timeline, a span-kind glyph, the component&amp;rsquo;s icon (the same icon set the topology map uses), the endpoint or peer name, and the span&amp;rsquo;s own duration. Errored spans are highlighted, and a flag marks any span carrying attached events. Crucially, the waterfall &lt;strong&gt;stitches spans across segments&lt;/strong&gt; using their parent references, so a request that crossed five services renders as one connected timeline rather than five disjoint ones.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tree&lt;/strong&gt; draws those same spans as a zoomable, pannable node graph — root on the left, callees flowing right — for when you care about the &lt;em&gt;shape&lt;/em&gt; of the call tree more than the exact timing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Statistics&lt;/strong&gt; rolls the spans up &lt;strong&gt;by name&lt;/strong&gt;: a sortable table of count and total / average / maximum duration per operation, so &amp;ldquo;which span am I spending all my time in, across every occurrence in this trace&amp;rdquo; is one sort away.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p06-traces-02-waterfall.webp&#34; alt=&#34;Figure 2: The Default view — the span waterfall, each span a service-colored bar on a shared timeline with its kind glyph, component icon, and duration; errored spans highlighted, cross-service spans stitched into one timeline.&#34;&gt;
Figure 2: The waterfall (Default) — one connected timeline across every service the request touched.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p06-traces-03-tree.webp&#34; alt=&#34;Figure 3: The Tree view — the same trace as a zoomable node graph, root on the left and callees flowing right.&#34;&gt;
Figure 3: The Tree view — the same spans as the call tree&amp;rsquo;s shape, zoom and pan to explore.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;inside-a-span&#34;&gt;Inside a span&lt;/h2&gt;
&lt;p&gt;Click any span and a detail panel opens beside it. &lt;strong&gt;Meta&lt;/strong&gt; lays out the essentials — service, instance, endpoint, kind (entry / exit / local / producer / consumer), component, peer, layer, start time, duration, and the error flag. Below it, when they apply:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cross-trace refs&lt;/strong&gt; — when a span&amp;rsquo;s parent lives in a &lt;em&gt;different&lt;/em&gt; trace (an async hop, a message consumed later), the reference is listed with the parent&amp;rsquo;s trace id, segment, and span — and the trace id is a &lt;strong&gt;link that swaps you straight into that other trace&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tags&lt;/strong&gt;, &lt;strong&gt;Logs&lt;/strong&gt; (timestamped per-span entries), and &lt;strong&gt;Attached Events&lt;/strong&gt; (named events with start/end times and summary key/values).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The detail header reports the trace&amp;rsquo;s start, total duration, span count, and how many distinct services it touched; from there you can copy the trace id or a shareable link. Opening a shared &lt;code&gt;?traceId=…&lt;/code&gt; URL lands you directly on that trace in an overlay — which is what makes a trace something you can paste into an incident channel and have a teammate land on the exact same view.&lt;/p&gt;
&lt;h2 id=&#34;native-and-zipkin-side-by-side&#34;&gt;Native and Zipkin, side by side&lt;/h2&gt;
&lt;p&gt;Not every layer&amp;rsquo;s traces come from SkyWalking&amp;rsquo;s own agents. A layer template carries a &lt;code&gt;traces.source&lt;/code&gt; setting — &lt;code&gt;native&lt;/code&gt;, &lt;code&gt;zipkin&lt;/code&gt;, or &lt;code&gt;both&lt;/code&gt; — and Horizon routes accordingly. Agent-instrumented layers (like General Service) use the &lt;strong&gt;native&lt;/strong&gt; explorer above; service-mesh and Kubernetes-flavored layers, where spans arrive as Zipkin/OpenTelemetry data, use the &lt;strong&gt;Zipkin&lt;/strong&gt; explorer; and a layer set to &lt;code&gt;both&lt;/code&gt; simply gets &lt;strong&gt;two sidebar tabs&lt;/strong&gt;, since native and Zipkin spans have genuinely different shapes and conditions.&lt;/p&gt;
&lt;p&gt;The Zipkin tab queries an upstream Zipkin store &lt;strong&gt;through OAP&amp;rsquo;s Zipkin query API&lt;/strong&gt; (the same compatibility surface OAP exposes for any Zipkin client — not a GraphQL or TraceQL path). Because Zipkin organizes data by its own service universe (the &lt;code&gt;serviceName&lt;/code&gt; on each span, which can drift from SkyWalking&amp;rsquo;s service list), the tab carries its own service controls instead of binding to the page&amp;rsquo;s service picker — with Zipkin-native conditions like &lt;em&gt;Remote service&lt;/em&gt;, &lt;em&gt;Span name&lt;/em&gt;, and an &lt;em&gt;Annotations&lt;/em&gt; query (&lt;code&gt;error&lt;/code&gt; or &lt;code&gt;key=value&lt;/code&gt;). The two stores fail independently: if Zipkin is unreachable, the native traces are unaffected, and vice versa.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p06-traces-04-zipkin.webp&#34; alt=&#34;Figure 4: The Zipkin trace tab on a mesh layer — its own service / remote-service / span-name / annotations conditions, with a Zipkin trace open in its waterfall, querying the upstream Zipkin store through OAP.&#34;&gt;
Figure 4: A layer set to Zipkin gets its own tab and its own service universe, querying the upstream Zipkin store through OAP.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;Open a Zipkin span and its detail keeps that Zipkin shape — the &lt;code&gt;CLIENT&lt;/code&gt; / &lt;code&gt;SERVER&lt;/code&gt; kind, the local and remote endpoints, and the raw Zipkin/OpenTelemetry tags (the &lt;code&gt;istio.*&lt;/code&gt; set, &lt;code&gt;http.status_code&lt;/code&gt;, the sidecar &lt;code&gt;node_id&lt;/code&gt;) exactly as Zipkin recorded them, with no translation into SkyWalking&amp;rsquo;s span model.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p06-traces-04-zipkin-span.webp&#34; alt=&#34;Figure 5: A Zipkin span&amp;rsquo;s detail panel — its Meta (Kind CLIENT, peer, duration) and the raw Zipkin/OTel tags, including the Istio sidecar set, kept in their native shape.&#34;&gt;
Figure 5: A Zipkin span keeps its native fields — the kind, the endpoints, and the raw &lt;code&gt;istio.*&lt;/code&gt; / HTTP tags.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;from-a-slow-row-to-the-trace-behind-it&#34;&gt;From a slow row to the trace behind it&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s a second way into a trace that doesn&amp;rsquo;t go through the explorer at all. A trace overlay is mounted once for the whole app, and several things open it by &lt;strong&gt;trace id&lt;/strong&gt;: a &lt;code&gt;?traceId=&lt;/code&gt; link, a cross-trace ref, a log line, and — as &lt;a href=&#34;/blog/2026-06-21-horizon-ui-dashboards-and-mqe/&#34;&gt;Part 2&lt;/a&gt; showed — the &lt;strong&gt;jump-to-trace&lt;/strong&gt; icon on a record widget&amp;rsquo;s slow-statement row. Because it resolves by id rather than by layer, it works even when the trace belongs to a &lt;em&gt;different&lt;/em&gt; layer than the one you&amp;rsquo;re looking at: a Virtual Database, Cache, or MQ service has no traces tab of its own, yet its slow statement still lands you on the originating trace. And when the jump carries a timestamp (a log row knows when its line was written), the lookup widens around that moment — so a trace that has already aged into cold storage still resolves instead of quietly missing.&lt;/p&gt;
&lt;h2 id=&#34;where-to-go-next&#34;&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;The Traces tab is one request in full detail; the dashboards and maps from the earlier posts are where you notice something&amp;rsquo;s wrong in the first place. For the field reference — every condition, the native-vs-Zipkin split, the span detail panel — see the &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/operate/traces/&#34;&gt;Traces docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;the Log Explorer&lt;/strong&gt; — the same triage instincts, applied to log streams instead of spans.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 1/16: SkyWalking&#39;s New Observability Console</title>
      <link>/blog/2026-06-21-skywalking-horizon-ui-introduction/</link>
      <pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-21-skywalking-horizon-ui-introduction/</guid>
      <description>
        
        
        &lt;p&gt;Apache SkyWalking Horizon UI is the next-generation web console for SkyWalking. It talks to the same OAP backend SkyWalking already runs — the same GraphQL query protocol, the same admin REST surface, the same MQE language, the same &lt;code&gt;Layer&lt;/code&gt; concept — so you can point it at a running OAP and log in without changing a thing on the backend. What changes is everything in front of that contract.&lt;/p&gt;
&lt;p&gt;This is the first post in a series. Across it we will walk through the dashboards and the metric query language, the topology views and a WebGL 3D map of your whole deployment, the trace and log explorers, profiling, the operations surface, access control, and the config-driven customization that ties it all together. This post sets the stage: what Horizon is, the one idea the whole UI is built around, and how to put it in front of your OAP today.&lt;/p&gt;
&lt;p&gt;Horizon is built around four verbs. You &lt;strong&gt;observe&lt;/strong&gt; — topology, traces, logs, all five flavors of profiling, read-only alarms, and per-layer dashboards. Then you &lt;strong&gt;operate&lt;/strong&gt; what you observe, &lt;strong&gt;govern&lt;/strong&gt; who is allowed to touch it, and &lt;strong&gt;customize&lt;/strong&gt; the whole console without writing UI code. Observe, operate, govern, customize: that is the arc of this series. We start where every session starts — the sidebar.&lt;/p&gt;
&lt;h2 id=&#34;the-sidebar-is-your-estate&#34;&gt;The sidebar is your estate&lt;/h2&gt;
&lt;p&gt;Open Horizon and the left sidebar is not a hand-built menu — it is a live reflection of what your OAP is actually reporting. Horizon asks OAP which layers exist and which of them have services, and renders exactly those, refreshing on a 60-second cadence. A layer starts reporting, it shows up; it goes quiet, it falls away. The menu can&amp;rsquo;t drift from reality because the menu &lt;em&gt;is&lt;/em&gt; reality, polled.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-01-sidebar-is-the-estate.webp&#34; alt=&#34;Figure 1: Horizon&amp;rsquo;s home — the estate on the left (layers grouped into Virtual Targets, Istio, Kubernetes and MQ, with a live &amp;ldquo;13 with services&amp;rdquo; count) and the cross-layer Services overview on the right.&#34;&gt;
Figure 1: Horizon&amp;rsquo;s home — the live estate on the left, the cross-layer Services overview on the right.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;A few things are happening in that sidebar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Live service counts.&lt;/strong&gt; The &lt;strong&gt;Layers&lt;/strong&gt; heading shows how many layers currently have services — &lt;em&gt;13 with services&lt;/em&gt; in Figure 1 — and each layer&amp;rsquo;s own count surfaces as you open it. Those counts come from a single server-side catalog the whole UI shares, refreshed once a minute, so the sidebar, the alarm layer-tagger, and the landing pages never disagree by a stale poll.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Grouped, not a flat list.&lt;/strong&gt; Layers cluster under their group — &lt;em&gt;Virtual Targets&lt;/em&gt;, &lt;em&gt;Istio&lt;/em&gt;, &lt;em&gt;Kubernetes&lt;/em&gt;, &lt;em&gt;MQ&lt;/em&gt; — beneath the &lt;strong&gt;Overviews&lt;/strong&gt; and &lt;strong&gt;Alarms&lt;/strong&gt; entries up top. Operate and admin areas (Cluster Status, Alerting rules, DSL Management, Users, Roles &amp;amp; permissions) appear lower down, and only for the roles allowed to open them — access control is woven into the menu itself, not bolted on after.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Nothing silently hidden.&lt;/strong&gt; Every layer OAP reports now appears — including ones with no bundled template, which fall back to a plain Service page. A hard-coded &amp;ldquo;hidden layers&amp;rdquo; list used to quietly drop layers like &lt;code&gt;BanyanDB&lt;/code&gt;; that&amp;rsquo;s gone. Hiding a layer is now an explicit choice in &lt;code&gt;horizon.yaml&lt;/code&gt; via &lt;code&gt;layers.excluded&lt;/code&gt; (which ships defaulting to &lt;code&gt;FAAS&lt;/code&gt; and &lt;code&gt;VIRTUAL_GATEWAY&lt;/code&gt; — clear the list to surface everything).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Click into any layer and Horizon opens its first available tab, following a consistent spine across every layer:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;service → instance → endpoint → topology → trace → logs → profiling
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The slot names follow the layer. The General Service layer in Figure 2 labels its endpoint slot &lt;strong&gt;API&lt;/strong&gt;, adds an &lt;strong&gt;API dependency&lt;/strong&gt; view, and fans &lt;strong&gt;Profiling&lt;/strong&gt; into the engines it actually has — Trace, eBPF, pprof (Go), and Async. Tabs a layer can&amp;rsquo;t support are simply turned off in its template, so you never land on an empty page. Pick a service and its dashboard fills the canvas to the right — a header strip of KPIs (RPM, Apdex, error rate, each with its own sparkline) over a widget grid scoped to exactly that entity.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-02-layer-drilldown.webp&#34; alt=&#34;Figure 2: Expand a layer and it fans out into its full workflow — here General Service shows Service, Instances, API, Topology, API dependency, Traces, Logs and four profiling engines, with the selected service&amp;rsquo;s dashboard filling the canvas.&#34;&gt;
Figure 2: Expand a layer and it fans out into its full workflow — the tab spine on the left, the selected service&amp;rsquo;s dashboard on the right.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;never-a-blank-page&#34;&gt;Never a blank page&lt;/h2&gt;
&lt;p&gt;A console that follows live data has to handle the moments when there isn&amp;rsquo;t any — a fresh install, a partially configured deployment, an OAP that just restarted. Horizon treats those as first-class states instead of dead ends.&lt;/p&gt;
&lt;p&gt;Open the app at &lt;code&gt;/&lt;/code&gt; and Horizon cascades to a real destination: the first available public overview dashboard, or failing that the first layer with services, and only if neither exists, the empty landing. When it does land on the empty page, it tells you which problem you have in plain language — &lt;strong&gt;&amp;ldquo;No data is flowing yet&amp;rdquo;&lt;/strong&gt; (nothing is reporting) versus &lt;strong&gt;&amp;ldquo;No dashboard configured yet&amp;rdquo;&lt;/strong&gt; (services exist, but no overview is set up) — and points you at your operations team rather than dropping you on a blank grid. As soon as a service reports or an operator publishes a dashboard, the next 60-second refresh replaces the empty page with the real one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-03-empty-no-dashboard-configured.webp&#34; alt=&#34;Figure 3: The empty landing names the actual situation — here &amp;ldquo;No dashboard configured yet&amp;rdquo; (services are reporting, but no overview is set up) — instead of showing a blank dashboard.&#34;&gt;
Figure 3: The empty landing names the actual situation — here, services are reporting but no overview is configured — instead of a blank dashboard.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;The same instinct shows up when OAP itself blips. If the backend goes briefly unreachable, Horizon keeps the last known sidebar shape on screen and raises an &amp;ldquo;OAP unreachable&amp;rdquo; banner, with service counts marked unknown until it recovers — so a short outage never looks like your configuration vanished.&lt;/p&gt;
&lt;p&gt;When data &lt;em&gt;is&lt;/em&gt; flowing, that landing is the war-room you already saw in Figure 1 — a cross-layer overview with per-kind service tiles (General services, Virtual databases, caches, MQs, GenAI), the live topology, and the active-alarm rail. Per-layer landings rank the true top-N services by your chosen column (no more capping at an arbitrary first 25 before ranking) and tell you &amp;ldquo;top N of M&amp;rdquo; so the trim is never silent.&lt;/p&gt;
&lt;p&gt;And because long layer names and deep namespaces happen, the shell gets out of the way: drag the divider to resize the sidebar (double-click to reset), or fold it to a thin icon rail to give the canvas every horizontal pixel — and the width is remembered per browser.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-04-resize.webp&#34; alt=&#34;Figure 4: Drag the divider to widen the sidebar so long layer and namespace names stop truncating.&#34;&gt;
Figure 4: Drag the divider to widen the sidebar — long names stop truncating, and the width is remembered per browser.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-04-fold-rail.webp&#34; alt=&#34;Figure 5: Fold the sidebar to a thin icon rail to give the canvas every horizontal pixel.&#34;&gt;
Figure 5: Fold the sidebar to a thin icon rail when you want the canvas to have every pixel.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;one-new-tier-makes-the-rest-possible&#34;&gt;One new tier makes the rest possible&lt;/h2&gt;
&lt;p&gt;Until now, SkyWalking&amp;rsquo;s web UI talked straight to OAP from the browser. Horizon introduces one small piece of infrastructure in between: a &lt;strong&gt;Backend-for-Frontend (BFF)&lt;/strong&gt;, a Fastify service on Node.js that serves the UI and proxies every call to OAP.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p01-intro-architecture.svg&#34; alt=&#34;Architecture diagram: the browser talks only to the Horizon BFF (a Fastify service on Node.js), which handles authentication and sessions, RBAC enforcement, audit logging, capability probing and caching, and server-side i18n and widget gating, then proxies to OAP&amp;rsquo;s GraphQL query host (port 12800) and admin host (port 17128).&#34;&gt;
&lt;em&gt;The browser talks only to the BFF; the BFF owns auth, RBAC, audit, capability probing, and server-side i18n / widget gating, then proxies to OAP&amp;rsquo;s query host (&lt;code&gt;:12800&lt;/code&gt;) and admin host (&lt;code&gt;:17128&lt;/code&gt;).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;That tier is why the later posts in this series exist at all. Authentication, role-based access control, and the audit trail are enforced on the server, where a forged request can&amp;rsquo;t get past them. The BFF probes OAP&amp;rsquo;s GraphQL schema once at startup and degrades gracefully when a capability is missing — which is exactly how Horizon supports two OAP generations from one build (more on that below). It caches the service catalog once a minute so the whole UI shares one view of the estate. We&amp;rsquo;ll come back to each of these in the posts on operations, security, and customization; for now the thing to know is that there&amp;rsquo;s a server here now, and it&amp;rsquo;s doing real work.&lt;/p&gt;
&lt;h2 id=&#34;a-3d-look-at-the-whole-thing&#34;&gt;A 3D look at the whole thing&lt;/h2&gt;
&lt;p&gt;One surface is worth previewing up front, because it captures the &amp;ldquo;stand back and look at everything at once&amp;rdquo; idea better than any screenshot can: the &lt;strong&gt;3D Infrastructure Map&lt;/strong&gt;. Every layer&amp;rsquo;s services become cubes, stacked onto tiers that read top-to-bottom the way a request flows, with live traffic, alarms, and call relationships drawn between them. Drag to orbit it:&lt;/p&gt;
&lt;figure class=&#34;map3d-stage&#34;&gt;
  &lt;a class=&#34;map3d-poster&#34; href=&#34;/3d-map-app/&#34; aria-label=&#34;Open the interactive 3D infrastructure map&#34;&gt;&lt;span class=&#34;map3d-badge&#34;&gt;Interactive · sample data&lt;/span&gt;
    &lt;img loading=&#34;lazy&#34; src=&#34;/images/home/horizon-3d-map.png&#34;
         alt=&#34;Apache SkyWalking topology rendered as an interactive 3D scene you can orbit, zoom and click&#34;&gt;
    &lt;span class=&#34;map3d-play&#34; aria-hidden=&#34;true&#34;&gt;
      &lt;span class=&#34;map3d-play-btn&#34;&gt;
        &lt;svg viewBox=&#34;0 0 24 24&#34; fill=&#34;none&#34;&gt;&lt;path d=&#34;M8 5v14l11-7-11-7z&#34; fill=&#34;currentColor&#34;/&gt;&lt;/svg&gt;
      &lt;/span&gt;
      &lt;span class=&#34;map3d-play-label&#34;&gt;Open the 3D map&lt;/span&gt;
    &lt;/span&gt;
  &lt;/a&gt;
&lt;/figure&gt;


&lt;p&gt;A dedicated post later in the series takes the 3D map apart properly — the tiers, the alarm beacons, &amp;ldquo;Beacon mode&amp;rdquo; that ghosts everything healthy so only what&amp;rsquo;s firing glows, and the structured editor that configures it. For now, it&amp;rsquo;s a fair picture of the ambition: your whole deployment, in one view, alive.&lt;/p&gt;
&lt;h2 id=&#34;whats-in-this-series&#34;&gt;What&amp;rsquo;s in this series&lt;/h2&gt;
&lt;p&gt;Fifteen posts follow this one, each a standalone tour of one corner of Horizon — read them in any order; each links back here for the lay of the land. They fall into four arcs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;See your data&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Dashboards &amp;amp; MQE&lt;/strong&gt; — widgets that query only what&amp;rsquo;s relevant to the entity in front of you, value formatting humans can read, synced crosshairs, and multi-entity compare.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Topology &amp;amp; service dependency&lt;/strong&gt; — one topology engine that repaints for every layer, the de-noising filter, and the multi-hop API-dependency graph.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Deployment tab &amp;amp; BanyanDB self-observability&lt;/strong&gt; — a view that looks &lt;em&gt;inside&lt;/em&gt; a single clustered service, and SkyWalking finally watching its own database the way it watches everything else.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The 3D Infrastructure Map&lt;/strong&gt; — the full treatment of the view above.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace explorer&lt;/strong&gt; — lasso the slow traces on a duration scatter, then read each one three ways.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Log explorer&lt;/strong&gt; — a Loki-style stream with facets, top patterns, and structured payloads.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Browser &amp;amp; RUM monitoring&lt;/strong&gt; — front-end error logs, and de-obfuscating a minified stack back to its original source line.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Five profilers, one flame graph&lt;/strong&gt; — trace, async-profiler, eBPF, Go pprof, and network profiling, unified behind one workflow.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Operate it&lt;/strong&gt;&lt;/p&gt;
&lt;ol start=&#34;9&#34;&gt;
&lt;li&gt;&lt;strong&gt;Alarms &amp;amp; incident triage&lt;/strong&gt; — incident-centric active alarms that ship the chart that fired them.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime rules, live debugging &amp;amp; inspect&lt;/strong&gt; — hot-reloadable rules and a live debugger that steps OAL (traces), MAL (metrics), and LAL (logs) against live samples.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Platform &amp;amp; cluster introspection&lt;/strong&gt; — read the OAP cluster&amp;rsquo;s health, resolved config, and data retention from the UI.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Govern &amp;amp; secure it&lt;/strong&gt;&lt;/p&gt;
&lt;ol start=&#34;12&#34;&gt;
&lt;li&gt;&lt;strong&gt;Access control &amp;amp; security&lt;/strong&gt; — server-enforced RBAC, LDAP/AD, an audit trail, and break-glass — the capabilities that make the UI enterprise-deployable.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Make it yours, and adopt it&lt;/strong&gt;&lt;/p&gt;
&lt;ol start=&#34;13&#34;&gt;
&lt;li&gt;&lt;strong&gt;Customization: config-driven layer templates&lt;/strong&gt; — draft-to-publish, and adding a whole new monitored layer with zero UI code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Localization&lt;/strong&gt; — every dashboard in eight languages, translated by clicking widgets in a live preview.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Getting started &amp;amp; migration&lt;/strong&gt; — install, the OAP version matrix, and swapping an existing UI for Horizon.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;try-it-against-your-oap-today&#34;&gt;Try it against your OAP today&lt;/h2&gt;
&lt;p&gt;Horizon runs against the OAP you already have — and on today&amp;rsquo;s &lt;strong&gt;OAP 10.x&lt;/strong&gt; nearly all of it works. Every dashboard, the topology, traces (native and &lt;strong&gt;Zipkin&lt;/strong&gt;), logs, alarms, and all five profilers render off OAP&amp;rsquo;s query host (&lt;code&gt;:12800&lt;/code&gt;), and Horizon&amp;rsquo;s access control, audit, and themes run in the BFF, independent of the OAP version. What waits for &lt;strong&gt;OAP 11.0&lt;/strong&gt; — the admin host (&lt;code&gt;:17128&lt;/code&gt;), releasing soon — is the &lt;em&gt;operate&lt;/em&gt; layer: runtime-rule (DSL) management, the Live Debugger, Metrics Inspect, the alarm-rule editor, the Cluster Status admin pane, and publishing template edits back to OAP. Horizon detects each admin module by its presence and simply hides the pages 10.x can&amp;rsquo;t serve, so the full observability console runs on 10.x today and the operate tooling lights up the moment you move to 11.0.&lt;/p&gt;
&lt;p&gt;Point Horizon at your existing cluster and bring it up — no backend changes, the same OAP your deployment already talks to:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker run -d --name horizon &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -p 8081:8081 &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -v &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#953800&#34;&gt;$PWD&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;/horizon.yaml:/app/horizon.yaml:ro&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -v horizon-state:/data &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ghcr.io/apache/skywalking-horizon-ui:&amp;lt;version&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A minimal &lt;code&gt;horizon.yaml&lt;/code&gt; is just where OAP lives and one local user to log in as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;queryUrl&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://&amp;lt;oap-host&amp;gt;:12800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;adminUrl&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://&amp;lt;oap-host&amp;gt;:17128&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;auth&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;backend&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;local&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;local&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;users&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0550ae&#34;&gt;username&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;admin&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;passwordHash&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;$argon2id$v=19$...&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;   &lt;/span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# generated, never plaintext&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;roles&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;admin]&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Open &lt;code&gt;http://&amp;lt;host&amp;gt;:8081/&lt;/code&gt;, log in, and the first stop is &lt;strong&gt;Cluster Status&lt;/strong&gt; to confirm Horizon and OAP are talking. From there, the sidebar fills in with your estate.&lt;/p&gt;
&lt;p&gt;For the full setup path — binary tarball, Kubernetes, LDAP, TLS, and the production checklist — see the &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/readme/&#34;&gt;Horizon UI documentation&lt;/a&gt;, which covers setup, compatibility, access control, customization, components, and operations from its left-side menu.&lt;/p&gt;
&lt;h2 id=&#34;other-notable-points&#34;&gt;Other notable points&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Drop-in against your existing OAP.&lt;/strong&gt; Horizon is a greenfield rewrite that keeps every backend contract — the same GraphQL query protocol, admin REST surface, MQE language, and &lt;code&gt;Layer&lt;/code&gt; concept — so you point it at a running cluster with no backend change. On today&amp;rsquo;s &lt;strong&gt;OAP 10.x&lt;/strong&gt; the whole observability console works (dashboards, topology, traces including &lt;strong&gt;Zipkin&lt;/strong&gt;, logs, alarms, profiling) along with Horizon&amp;rsquo;s BFF-side access control, audit, and themes; only the &lt;em&gt;operate&lt;/em&gt; tooling — runtime rules, Live Debugger, Inspect, the Cluster Status admin pane, and publishing template edits — waits on OAP&amp;rsquo;s &lt;strong&gt;admin host&lt;/strong&gt; (&lt;code&gt;:17128&lt;/code&gt;), which ships with &lt;strong&gt;OAP 11.0 — releasing soon&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dark-first and dense.&lt;/strong&gt; A 12-column grid built for incident scanning — more signal above the fold, less whitespace.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Built on a modern stack.&lt;/strong&gt; Vue 3 + TypeScript on Vite, Pinia, Apache ECharts, D3, and Monaco on the front end; Fastify on Node.js for the BFF.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It&amp;rsquo;s Apache-licensed and community-built.&lt;/strong&gt; Horizon UI lives at &lt;a href=&#34;https://github.com/apache/skywalking-horizon-ui&#34;&gt;apache/skywalking-horizon-ui&lt;/a&gt;. Try it against your cluster, and tell us what&amp;rsquo;s missing — issues and pull requests are welcome.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next up in the series: the dashboards — and why a widget can decide, on the server, that it shouldn&amp;rsquo;t even run its query for the entity you&amp;rsquo;re looking at.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 2/16: Dashboards That Adapt — MQE, Smart Widgets, and Numbers Humans Can Read</title>
      <link>/blog/2026-06-21-horizon-ui-dashboards-and-mqe/</link>
      <pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-21-horizon-ui-dashboards-and-mqe/</guid>
      <description>
        
        
        &lt;p&gt;This is the second post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Apache SkyWalking Horizon UI&lt;/a&gt; series. The first one introduced the console and its layered navigation; this one is about the surface you spend most of your day on — the &lt;strong&gt;dashboards&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Every dashboard in Horizon is the same machine: a grid of widgets, each one an &lt;strong&gt;MQE expression&lt;/strong&gt; the BFF resolves against OAP. What makes the surface worth a post of its own is what that machine does &lt;em&gt;around&lt;/em&gt; the numbers — it hides the widgets that don&amp;rsquo;t apply to the entity you&amp;rsquo;re looking at (and skips their queries entirely), it renders coded values and raw byte counts as things a human can read, it ties every chart on the page to one cursor, and it drops you from a slow-SQL row straight into the trace that produced it. Let&amp;rsquo;s walk through it.&lt;/p&gt;
&lt;h2 id=&#34;every-widget-is-one-mqe-expression&#34;&gt;Every widget is one MQE expression&lt;/h2&gt;
&lt;p&gt;A layer dashboard is a dense &lt;strong&gt;12-column grid&lt;/strong&gt; (120px rows, gaps backfilled so a wall of tiles has no holes, collapsing to one column under ~1100px). Each tile is one of five widget types, and the type follows the &lt;em&gt;shape&lt;/em&gt; of its MQE expression:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;card&lt;/code&gt;&lt;/strong&gt; — the expression collapses to a single scalar (&lt;code&gt;latest(...)&lt;/code&gt;, &lt;code&gt;avg(...)&lt;/code&gt;, &lt;code&gt;service_sla/100&lt;/code&gt;). One big number.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;line&lt;/code&gt;&lt;/strong&gt; — a time series; one line per expression, optional dual y-axis for mixed units (throughput on the left, latency on the right).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;top&lt;/code&gt;&lt;/strong&gt; — a ranked list from &lt;code&gt;top_n(endpoint_cpm, 20, des)&lt;/code&gt;, with a small tab switcher to flip the ranking between Traffic / Slow / Successful Rate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;record&lt;/code&gt;&lt;/strong&gt; — record-shaped output like slow database statements or slow cache commands: rows of text + value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;table&lt;/code&gt;&lt;/strong&gt; — a labeled &lt;code&gt;latest(...)&lt;/code&gt; metric, one row per label combination (pod phase per service, node condition, replicas per deployment).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You don&amp;rsquo;t pick the chart; you write the MQE and the right widget renders it. And it&amp;rsquo;s the &lt;em&gt;same&lt;/em&gt; grid system at every altitude — the &lt;code&gt;dashboards.&amp;lt;scope&amp;gt;&lt;/code&gt; map on a layer template carries a different widget set for the &lt;strong&gt;service&lt;/strong&gt;, &lt;strong&gt;instance&lt;/strong&gt;, and &lt;strong&gt;endpoint&lt;/strong&gt; pages, so drilling down swaps the whole dashboard to the right scope. (All of these run on the BFF tier introduced in &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;part 1&lt;/a&gt; — the browser never talks to OAP directly.)&lt;/p&gt;
&lt;h2 id=&#34;a-dashboard-that-adapts-to-the-entity-in-front-of-you&#34;&gt;A dashboard that adapts to the entity in front of you&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s the feature that changes how a dashboard &lt;em&gt;feels&lt;/em&gt;. A widget can carry a &lt;strong&gt;&lt;code&gt;visibleWhen&lt;/code&gt;&lt;/strong&gt; gate, and when the gate doesn&amp;rsquo;t hold, the widget doesn&amp;rsquo;t render — and, crucially, &lt;strong&gt;its query never runs&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There are two kinds of gate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MQE metric&lt;/strong&gt; — show the widget only when an expression &lt;em&gt;has value&lt;/em&gt; (&lt;code&gt;op: exists&lt;/code&gt;), or when its value crosses a threshold (&lt;code&gt;gt&lt;/code&gt; / &lt;code&gt;lt&lt;/code&gt;). Point a widget at its own metric and it self-gates: the JVM widgets carry &lt;code&gt;&amp;quot;visibleWhen&amp;quot;: { &amp;quot;kind&amp;quot;: &amp;quot;mqe&amp;quot;, &amp;quot;expression&amp;quot;: &amp;quot;instance_jvm_cpu&amp;quot;, &amp;quot;op&amp;quot;: &amp;quot;exists&amp;quot; }&lt;/code&gt;, so they appear on a Java instance and vanish on a Go one.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Entity attribute&lt;/strong&gt; — on the Instance scope, gate on the selected instance&amp;rsquo;s attributes (&lt;code&gt;language eq JAVA&lt;/code&gt;, or an attribute simply being present).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because the gate is evaluated &lt;strong&gt;server-side&lt;/strong&gt;, a non-JVM instance doesn&amp;rsquo;t just hide the JVM tiles — the BFF never sends their queries to OAP at all. Open the same Instance dashboard for a JVM service and a non-JVM one and you&amp;rsquo;re looking at one template adapting itself, not two hand-built pages:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-01-instance-jvm.webp&#34; alt=&#34;Figure 1: A Java service&amp;rsquo;s Instance dashboard — the JVM widget group (CPU, heap, GC, threads, classes) is present because instance_jvm_cpu returns data.&#34;&gt;
Figure 1: On a JVM instance the JVM widgets render — their &lt;code&gt;visibleWhen&lt;/code&gt; gate holds.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-02-instance-go.webp&#34; alt=&#34;Figure 2: The same Instance dashboard on the Go &amp;ldquo;rating&amp;rdquo; service — the JVM group is simply absent and the grid has reflowed; those widgets&amp;rsquo; queries were never sent to OAP.&#34;&gt;
Figure 2: The same dashboard on a Go instance — the JVM widgets aren&amp;rsquo;t there, and their queries never ran. One template, adapting to the entity.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;numbers-humans-can-read&#34;&gt;Numbers humans can read&lt;/h2&gt;
&lt;p&gt;Raw metrics are not always readable metrics. Horizon&amp;rsquo;s widgets format three cases that used to make operators do math in their heads:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;enum&lt;/code&gt;&lt;/strong&gt; — a value→label map turns a coded gauge into words: a &lt;code&gt;1/0&lt;/code&gt; success metric renders &lt;strong&gt;&lt;code&gt;OK&lt;/code&gt;&lt;/strong&gt; / &lt;strong&gt;&lt;code&gt;Failed&lt;/code&gt;&lt;/strong&gt; instead of the bare number. The labels are translatable per locale.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;duration&lt;/code&gt;&lt;/strong&gt; — a metric in seconds renders as a human time-ago: &lt;strong&gt;&lt;code&gt;5m 20s ago&lt;/code&gt;&lt;/strong&gt;, compact to &lt;code&gt;5m&lt;/code&gt; / &lt;code&gt;2h&lt;/code&gt; on an axis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SI suffixes&lt;/strong&gt; — large magnitudes on chart axes and tooltips read as &lt;strong&gt;&lt;code&gt;45.1k&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;1.34M&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;2.5G&lt;/code&gt;&lt;/strong&gt; rather than &lt;code&gt;4.51e4&lt;/code&gt;, with the axis tick and its hovered value sharing one notation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-03-si-suffix-axis.webp&#34; alt=&#34;Figure 3: A line chart whose y-axis and crosshair tooltip both read in compact SI suffixes (45.1k, 1.34M) — the tick and the hovered value in one notation.&#34;&gt;
Figure 3: Dense byte and count series get compact SI suffixes, axis and tooltip in step.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-04-enum-duration-cards.webp&#34; alt=&#34;Figure 4: Two BanyanDB cards — &amp;ldquo;Last Sync&amp;rdquo; showing OK (from a coded 1 via an enum map) and &amp;ldquo;Time Since Last Sync&amp;rdquo; showing &amp;ldquo;5m 20s ago&amp;rdquo; (from a seconds metric via the duration format).&#34;&gt;
Figure 4: The &lt;code&gt;enum&lt;/code&gt; and &lt;code&gt;duration&lt;/code&gt; formats in action on BanyanDB&amp;rsquo;s lifecycle cards — &lt;code&gt;OK&lt;/code&gt; instead of &lt;code&gt;1&lt;/code&gt;, &amp;ldquo;5m 20s ago&amp;rdquo; instead of a seconds count.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;read-the-whole-grid-as-one-timeline&#34;&gt;Read the whole grid as one timeline&lt;/h2&gt;
&lt;p&gt;Every &lt;code&gt;line&lt;/code&gt; chart on a page shares &lt;strong&gt;one hover cursor&lt;/strong&gt;. Point at minute 32 on the throughput chart and minute 32 lights up on the latency chart, the error-rate chart, and every sparkline tile. The contract is enforced at the chart-wrapper level — no widget can opt out — so the page reads as a single coordinated view of one moment, not a dozen independent charts. The multi-series tooltip is a fixed, aligned table that shows each series&amp;rsquo; &lt;strong&gt;title&lt;/strong&gt; (never the raw MQE), with values in one right-aligned column.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-05-synced-crosshair.webp&#34; alt=&#34;Figure 5: The synced crosshair sweeping across the throughput, error-rate and latency panels — one cursor, one moment, every chart aligned.&#34;&gt;
Figure 5: One cursor moves across every line chart on the page, so you read the same instant everywhere at once.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;from-a-slow-row-to-its-trace-in-one-click&#34;&gt;From a slow row to its trace, in one click&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;record&lt;/code&gt; widgets — Slow Statements, Slow Commands, Slow Database Statements — are lists of sampled records, and each row that carries a trace id gets a &lt;strong&gt;jump-to-trace&lt;/strong&gt; icon at its head that opens the originating trace&amp;rsquo;s waterfall. It resolves the trace &lt;strong&gt;by id, not by layer&lt;/strong&gt;, which matters: the Slow Statements on a Virtual Database / Cache / MQ service belong to the &lt;em&gt;caller&lt;/em&gt; on another layer, and a virtual-target layer has no traces tab of its own — yet the jump still lands. The statement text itself is &lt;strong&gt;click-to-copy&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-06-record-jump-to-trace.webp&#34; alt=&#34;Figure 6: A Slow Statements record widget — each sampled row with a jump-to-trace icon (where the sample has a trace id), the statement text click-to-copy, one row showing the &amp;ldquo;copied&amp;rdquo; flash.&#34;&gt;
Figure 6: From a slow statement to the trace that ran it — resolved by trace id, so it works even on a virtual layer with no traces tab of its own.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;pin-and-compare-entities&#34;&gt;Pin and compare entities&lt;/h2&gt;
&lt;p&gt;Sometimes one entity isn&amp;rsquo;t enough. Horizon lets you &lt;strong&gt;lock several services, instances, or endpoints — even ones from different services — and compare them in place&lt;/strong&gt;. Pin entities from the picker or the instance/endpoint list; the one you&amp;rsquo;re viewing is always part of the cohort (tagged &lt;code&gt;CURRENT&lt;/code&gt;) and still drives the header, and each pin adds its own hue. Every widget then compares inline — line widgets overlay one series per entity, cards show a row each, &lt;code&gt;top&lt;/code&gt; and &lt;code&gt;record&lt;/code&gt; widgets get per-entity tabs, tables gain an Entity column. A persistent comparison bar holds the cohort no matter how the underlying list paginates or which entity you&amp;rsquo;re currently viewing, and each entity loads as its own request, so one slow one never blanks the others.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p02-dashboards-07-pinned-entities.webp&#34; alt=&#34;Figure 7: Comparing two instances from different services — app (tagged CURRENT) and rating — overlaid hue-by-hue across the Load, Latency and Success Rate line widgets, with the comparison bar above.&#34;&gt;
Figure 7: Lock entities — even across services — and every line widget overlays them hue-by-hue; the comparison bar holds the cohort while the CURRENT entity still drives the header.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-time-picker-moves-the-whole-dashboard&#34;&gt;The time picker moves the &lt;em&gt;whole&lt;/em&gt; dashboard&lt;/h2&gt;
&lt;p&gt;The topbar time range drives everything on the page — the header KPI strip, the widget body, and (on BanyanDB&amp;rsquo;s tiered hot/warm/cold storage) the &lt;strong&gt;Cold&lt;/strong&gt; pill flow end-to-end. Earlier the landing and topology routes were pinned to the last 60 minutes, so picking &amp;ldquo;12 days ago&amp;rdquo; quietly kept showing recent numbers; now the picker is honored everywhere, and when an upstream control changes, each dependent tile visibly resets and shows a &amp;ldquo;Reading data…&amp;rdquo; hint rather than leaving a stale value under a spinner.&lt;/p&gt;
&lt;h2 id=&#34;where-to-go-next&#34;&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s worth stressing that this is &lt;em&gt;one&lt;/em&gt; system. The same five widgets, the same MQE, the same gating render every layer&amp;rsquo;s dashboard — the JVM panels above, BanyanDB&amp;rsquo;s lifecycle cards, the percentile latency on a mesh service, and purpose-built panels for things like an Envoy AI Gateway (token throughput, time-to-first-token) or a GenAI virtual layer (per-model estimated cost). What changes from layer to layer is the MQE, not the machinery.&lt;/p&gt;
&lt;p&gt;Everything above is the &lt;em&gt;reading&lt;/em&gt; experience. Each widget&amp;rsquo;s MQE, its &lt;code&gt;visibleWhen&lt;/code&gt; gate, its format, and the per-scope grids are all editable from the &lt;strong&gt;Layer dashboards&lt;/strong&gt; admin — but that authoring story (draft → preview → publish, with the inline-and-expand MQE editor) is its own post later in the series. For the field-level reference, see the docs on &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/components/dashboard-widgets/&#34;&gt;dashboard widgets&lt;/a&gt; and &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/components/charts/&#34;&gt;charts&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;topology and service dependency&lt;/strong&gt; — the same data Horizon charts here, drawn as a map you can walk.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 3/16: Topology &amp; Service Dependency</title>
      <link>/blog/2026-06-21-horizon-ui-topology-and-dependency/</link>
      <pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-21-horizon-ui-topology-and-dependency/</guid>
      <description>
        
        
        &lt;p&gt;This is the third post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Meet Horizon UI&lt;/a&gt; series. &lt;a href=&#34;/blog/2026-06-21-horizon-ui-dashboards-and-mqe/&#34;&gt;Part 2&lt;/a&gt; was about reading your services as numbers on a dashboard; this one is about reading them as a &lt;strong&gt;map&lt;/strong&gt; — who calls whom, how hard, and how that same logical service looks across every layer it reports through.&lt;/p&gt;
&lt;p&gt;The call data behind a SkyWalking topology is one thing; the &lt;em&gt;views&lt;/em&gt; Horizon draws from it are several. There&amp;rsquo;s the per-layer service map, a drill-down into the instances behind a single call, an endpoint-level dependency graph, and a cross-layer overlay that ties a service&amp;rsquo;s faces together. They&amp;rsquo;re all the same engine, pointed at different questions. (The Deployment tab — the map of one clustered service&amp;rsquo;s &lt;em&gt;own&lt;/em&gt; instances — and the WebGL 3D map are big enough to get their own posts next.)&lt;/p&gt;
&lt;h2 id=&#34;one-topology-engine-repainted-per-layer&#34;&gt;One topology engine, repainted per layer&lt;/h2&gt;
&lt;p&gt;Open any layer&amp;rsquo;s &lt;strong&gt;Topology&lt;/strong&gt; tab and you get a left-to-right hierarchical service map: &lt;code&gt;User&lt;/code&gt; (when present) seeds the left, and each service sits in a column by its call depth — within a column, nodes keep the order the graph walk reached them, so the dominant chain reads top-down. Each service is a &lt;strong&gt;hexagonal node&lt;/strong&gt;, and everything it shows is driven by the layer&amp;rsquo;s config — nothing is hardcoded:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the hexagon&amp;rsquo;s &lt;strong&gt;border&lt;/strong&gt; carries the node&amp;rsquo;s &lt;strong&gt;ring&lt;/strong&gt; metric as an SLA-style health band (green → red);&lt;/li&gt;
&lt;li&gt;the component &lt;strong&gt;icon&lt;/strong&gt; sits inside the hex — the same icon set the trace waterfall uses, so a PostgreSQL node looks like PostgreSQL, a Kafka node like Kafka;&lt;/li&gt;
&lt;li&gt;the node&amp;rsquo;s headline number — its &lt;strong&gt;center&lt;/strong&gt; metric — prints just &lt;strong&gt;above&lt;/strong&gt; the hex, with its unit;&lt;/li&gt;
&lt;li&gt;the service &lt;strong&gt;name&lt;/strong&gt; prints &lt;strong&gt;below&lt;/strong&gt; it, and a &lt;strong&gt;secondary&lt;/strong&gt; metric (latency, by default) sits beneath the name;&lt;/li&gt;
&lt;li&gt;each &lt;strong&gt;edge&lt;/strong&gt; carries the call&amp;rsquo;s throughput as an &lt;strong&gt;RPM chip&lt;/strong&gt; (the server-side metric, falling back to the client side).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;rsquo;s the part worth stressing: every one of those is just the &lt;strong&gt;General layer&amp;rsquo;s bundled default&lt;/strong&gt;. The node&amp;rsquo;s center / ring / secondary metrics and the edge metrics each live in the &lt;strong&gt;Layer dashboards admin → Topology scope&lt;/strong&gt; as an MQE expression with a unit and a role — so you can point any slot at a different metric, and the same engine paints a different map for a different layer, or for this one your way. (The choices travel with the layer template&amp;rsquo;s export/import, like everything else.)&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-01-service-map.webp&#34; alt=&#34;Figure 1: A per-layer service map — hexagonal nodes whose border is an SLA health band, with the headline metric above each node, the component icon inside, the name below, and an RPM chip on every edge.&#34;&gt;
Figure 1: One template-driven topology engine — health-banded hex nodes, RPM-chipped edges, real component icons; every metric on it is configured per layer.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;cut-the-noise&#34;&gt;Cut the noise&lt;/h2&gt;
&lt;p&gt;Real topologies are noisy — a dense map fills with conjectured peers OAP couldn&amp;rsquo;t fully resolve (a bare &lt;code&gt;rcmd:80&lt;/code&gt;, an un-instrumented address). Horizon&amp;rsquo;s &lt;strong&gt;Filter&lt;/strong&gt; control turns those off without hiding your real dependencies. It derives one facet automatically — &lt;strong&gt;by layer&lt;/strong&gt; — and presents it exactly as the sidebar does: each row carries the layer&amp;rsquo;s own icon and localized name (&lt;em&gt;Virtual Database&lt;/em&gt;, &lt;em&gt;Java Agent&lt;/em&gt;, …), plus an &lt;strong&gt;Others&lt;/strong&gt; bucket for nodes OAP couldn&amp;rsquo;t classify and a standalone &lt;strong&gt;User&lt;/strong&gt; toggle. Uncheck &lt;em&gt;Others&lt;/em&gt; and the uninstrumented clutter — and its dangling edges — disappears, while your databases, caches and queues (separated by their own &lt;code&gt;VIRTUAL_*&lt;/code&gt; rows) stay on the map. Filtering is client-side and the rows re-derive on every refresh, so it never goes stale.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-02-filter-denoise.webp&#34; alt=&#34;Figure 2: The topology Filter — one auto-derived by-layer facet (each row with its layer icon and name), an Others bucket, and a User toggle — cutting the conjectured peers out of a dense map.&#34;&gt;
Figure 2: De-noise in one click — drop the unresolved &amp;ldquo;Others&amp;rdquo; peers and keep your real dependencies.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;When a layer&amp;rsquo;s services fall into OAP service groups, the map&amp;rsquo;s service-focus selector groups by them too, and clicking a group header &lt;strong&gt;batch-selects or clears every service in that group&lt;/strong&gt; — so you can focus a whole team&amp;rsquo;s slice of a busy map at once.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-03-group-selector.webp&#34; alt=&#34;Figure 3: The group-aware service selector — services grouped by OAP Service.group, with a group header that focuses or clears the whole group in one click.&#34;&gt;
Figure 3: Focus a whole service group at once from the map&amp;rsquo;s selector.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;drill-from-a-call-into-its-instances&#34;&gt;Drill from a call into its instances&lt;/h2&gt;
&lt;p&gt;A service-to-service edge is an aggregate — behind it are real instances talking to real instances. Click a call on the map and choose &lt;strong&gt;Instance map →&lt;/strong&gt;, and Horizon draws exactly that: the client service&amp;rsquo;s instances in the left column, the server service&amp;rsquo;s in the right, with the instance-level calls between them, animated client→server. It reuses everything from the service map — the health-ring nodes, the per-call client/server metric sidebar, a node popover with &lt;strong&gt;Open instance dashboard&lt;/strong&gt; — and labels the columns in the layer&amp;rsquo;s own vocabulary (&lt;em&gt;Pods&lt;/em&gt; on Kubernetes, &lt;em&gt;Sidecars&lt;/em&gt; on the data plane). The two service pickers are &lt;strong&gt;relationship-aware&lt;/strong&gt;: the server list is the chosen client&amp;rsquo;s callees, the client list is the chosen server&amp;rsquo;s callers, each re-deriving as you change the other.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-04-instance-map.webp&#34; alt=&#34;Figure 4: The instance map — the client service&amp;rsquo;s instances (left) and the server service&amp;rsquo;s (right) with the instance-level calls between them, plus the per-call metric sidebar.&#34;&gt;
Figure 4: Drill from an aggregate call into the instance-to-instance traffic behind it.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;walk-the-request-chain-endpoint-by-endpoint&#34;&gt;Walk the request chain, endpoint by endpoint&lt;/h2&gt;
&lt;p&gt;Service topology answers &amp;ldquo;which services call this one.&amp;rdquo; The &lt;strong&gt;API dependency&lt;/strong&gt; tab answers the sharper question — &amp;ldquo;which &lt;em&gt;endpoints&lt;/em&gt; call this endpoint, and which does it call.&amp;rdquo; Pick an endpoint and it lays out in columns by direction: callers on the left, the focus endpoint in the centre, callees on the right, with the same SLA-coloured node border, the RPM and latency on every edge, and the heaviest edge labeled. A selected node shows a single &lt;strong&gt;+&lt;/strong&gt; handle that pulls in &lt;em&gt;its&lt;/em&gt; own callers and callees in one click, so you walk the chain one hop at a time instead of drowning in the whole graph; drag nodes apart, and the drill-out links (&lt;strong&gt;Open endpoint&lt;/strong&gt;, &lt;strong&gt;Service →&lt;/strong&gt;) open in a new tab so you keep the graph you&amp;rsquo;re exploring.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-05-api-dependency.webp&#34; alt=&#34;Figure 5: The API-dependency graph — callers left, the focus endpoint centre, callees right, expanded one hop with the + handle, latency and SLA-coloured RPM on every edge.&#34;&gt;
Figure 5: Walk the endpoint call chain a hop at a time, latency on every edge.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;one-service-every-layer-it-reports-through&#34;&gt;One service, every layer it reports through&lt;/h2&gt;
&lt;p&gt;A single logical service often reports through several layers at once — a General agent, a Service Mesh sidecar, the mesh data plane, a Kubernetes pod. SkyWalking has modeled that cross-layer hierarchy since OAP 10; what Horizon adds is making it &lt;strong&gt;one click from anywhere on the map&lt;/strong&gt;. Select a node and Horizon lazily probes its hierarchy — if the service has cross-layer peers, a small &lt;strong&gt;chevron-stack chip&lt;/strong&gt; clips to the node.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-06-smartscape.webp&#34; alt=&#34;Figure 6: A selected service node carrying the chevron-stack chip — the cue that this service also reports through other layers.&#34;&gt;
Figure 6: The chevron-stack chip on a selected node — lazily probed on selection, it shows only when the service has cross-layer peers.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;Click the chip and the topology dims under a &lt;strong&gt;Smartscape&lt;/strong&gt; overlay: the focused node re-renders bright in place, and its peers fan out vertically by OAP&amp;rsquo;s layer order — request-near layers above, infra-near below. From there a two-step click opens any peer in its own layer, pre-selected. (Auto-refresh pauses while the overlay is open so nothing shifts under you.)&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p03-topology-06-smartscape-expand.webp&#34; alt=&#34;Figure 7: The Smartscape overlay — one logical service projected across its layers (General agent, Service Mesh, mesh data plane, Kubernetes), fanned by layer order, each peer one click from its own layer.&#34;&gt;
Figure 7: One service, every layer it reports through — the cross-layer hierarchy as a one-click overlay on the map.&lt;/br&gt;&lt;/p&gt;
&lt;h2 id=&#34;where-to-go-next&#34;&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;Every metric, threshold, and edge weight on these maps lives in the layer template&amp;rsquo;s &lt;code&gt;topology&lt;/code&gt; block — which means you tune them the same config-driven way you tune dashboards, the subject of a later post in this series. For the field reference, see the &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-horizon-ui/next/customization/layer-templates/&#34;&gt;layer-template&lt;/a&gt; topology docs.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;the Deployment tab and BanyanDB self-observability&lt;/strong&gt; — where the same map technique turns &lt;em&gt;inward&lt;/em&gt; to show how one clustered service&amp;rsquo;s own instances are deployed and talk to each other.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Meet Horizon UI · 4/16: The Deployment Tab &amp; BanyanDB Self-Observability</title>
      <link>/blog/2026-06-21-horizon-ui-deployment-and-banyandb/</link>
      <pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-06-21-horizon-ui-deployment-and-banyandb/</guid>
      <description>
        
        
        &lt;p&gt;This is the fourth post in the &lt;a href=&#34;/blog/2026-06-21-skywalking-horizon-ui-introduction/&#34;&gt;Meet Horizon UI&lt;/a&gt; series. &lt;a href=&#34;/blog/2026-06-21-horizon-ui-topology-and-dependency/&#34;&gt;Part 3&lt;/a&gt; drew the map &lt;em&gt;between&lt;/em&gt; services. This one turns the same map &lt;strong&gt;inward&lt;/strong&gt; — onto the instances &lt;em&gt;inside&lt;/em&gt; one clustered service — and uses it for something SkyWalking has never shown well before: &lt;strong&gt;its own storage engine&lt;/strong&gt;, BanyanDB, modeled as the cluster it actually is.&lt;/p&gt;
&lt;h2 id=&#34;the-deployment-tab-a-map-of-one-services-own-instances&#34;&gt;The Deployment tab: a map of one service&amp;rsquo;s own instances&lt;/h2&gt;
&lt;p&gt;The service map answers &amp;ldquo;who calls this service.&amp;rdquo; The new per-layer &lt;strong&gt;Deployment&lt;/strong&gt; tab answers a different question: &amp;ldquo;how is &lt;em&gt;this one&lt;/em&gt; service deployed, and how do its own instances talk to each other?&amp;rdquo; Pick a service and the tab draws its instances as nodes with the instance-to-instance calls between them — the same pan/zoom canvas, health-ring nodes, animated edge flow and per-call metric sidebar you know from the service map, but scoped to a single service&amp;rsquo;s internals.&lt;/p&gt;
&lt;p&gt;Three things make it more than a flat node cloud:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Instances render as hexagons that bundle into pods.&lt;/strong&gt; A pod&amp;rsquo;s &lt;strong&gt;main&lt;/strong&gt; container is a full hex with its &lt;strong&gt;sibling&lt;/strong&gt; containers attached as smaller hexes around its edge — so a main process and its sidecars read as one unit, and a cross-pod sidecar link connects the exact small hex it belongs to.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pods cluster into labelled boxes&lt;/strong&gt; by a rule you choose — a single instance attribute (role), several attributes combined (e.g. &lt;code&gt;node_role&lt;/code&gt; + &lt;code&gt;node_type&lt;/code&gt;), or a name regex — so a fleet of mixed-role nodes reads as one box per role instead of a cloud.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The layout is tiered.&lt;/strong&gt; Each cluster box lays its pods out by call depth — sources on the left, what they call to the right — so an upstream→downstream chain reads left-to-right; drag any pod and its box re-flows to keep everything enclosed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Edges are keyed by the &lt;strong&gt;(source-role → target-role)&lt;/strong&gt; pair, so each kind of link shows its &lt;em&gt;own&lt;/em&gt; metrics rather than one flat set, prints its headline number inline on the edge, and lists in full in a &lt;strong&gt;Flows&lt;/strong&gt; sub-tab — one aligned table per role-pair. It&amp;rsquo;s off by default and, like the service map, entirely configured from the &lt;strong&gt;Layer dashboards admin → Deployment scope&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;banyandb-watched-like-everything-else&#34;&gt;BanyanDB, watched like everything else&lt;/h2&gt;
&lt;p&gt;That machinery exists for a reason. SkyWalking&amp;rsquo;s native database, &lt;strong&gt;BanyanDB&lt;/strong&gt;, is a clustered, role- and tier-aware system — and until now SkyWalking couldn&amp;rsquo;t really observe it as one. The new &lt;strong&gt;BanyanDB&lt;/strong&gt; layer (under &lt;strong&gt;Self-Observability&lt;/strong&gt;), pairing with OAP backend SWIP-15, models the whole deployment from metrics scraped through BanyanDB&amp;rsquo;s FODC proxy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;cluster&lt;/strong&gt; is one &lt;strong&gt;Cluster&lt;/strong&gt; (a service),&lt;/li&gt;
&lt;li&gt;each container is one &lt;strong&gt;Container&lt;/strong&gt; (an instance, carrying its &lt;code&gt;container_name&lt;/code&gt; &lt;strong&gt;role&lt;/strong&gt; and &lt;code&gt;node_type&lt;/code&gt; &lt;strong&gt;tier&lt;/strong&gt; as attributes),&lt;/li&gt;
&lt;li&gt;and each storage &lt;strong&gt;Group&lt;/strong&gt; is an &lt;strong&gt;endpoint&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the same Service / Instance / Endpoint spine every other layer uses now means &lt;strong&gt;Cluster / Container / Group&lt;/strong&gt; for BanyanDB — and the Deployment tab on top of it draws the cluster itself.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-01-banyandb-deployment.webp&#34; alt=&#34;Figure 1: The Deployment tab on BanyanDB — containers grouped into role/tier boxes (liaison, data hot/warm/cold) as hexagon pods, with role-pair call edges between them and each role&amp;rsquo;s own health-ring metric.&#34;&gt;
Figure 1: SkyWalking watching its own database — the BanyanDB cluster drawn by role and tier, with liaison→data and lifecycle→data edges between the pods.&lt;/br&gt;&lt;/p&gt;
&lt;h3 id=&#34;cluster-container-group&#34;&gt;Cluster, Container, Group&lt;/h3&gt;
&lt;p&gt;Each scope is a purpose-built dashboard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Cluster&lt;/strong&gt; dashboard is the war-room for the whole database: write / query / error-rate KPIs, CPU / memory / disk capacity, throughput and error trends, and a &lt;strong&gt;Containers by Role&lt;/strong&gt; table.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Container&lt;/strong&gt; dashboard &lt;strong&gt;adapts to the selected container&amp;rsquo;s role&lt;/strong&gt;. Every container shows CPU / memory / Go-runtime resources; a &lt;strong&gt;liaison&lt;/strong&gt; adds ingestion, query, gRPC errors and the tier-2 publish pipeline and write-queue depth; a &lt;strong&gt;data&lt;/strong&gt; node adds storage totals, merge / compaction, the inverted index, the subscribe queue and retention; a &lt;strong&gt;lifecycle&lt;/strong&gt; sidecar shows migration cycles and last-run time / status. The role-specific panels are gated on the container&amp;rsquo;s role attribute, so you only ever see what applies to the node in front of you.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Group&lt;/strong&gt; dashboard splits &lt;strong&gt;per data-model&lt;/strong&gt; — measure, stream, trace, property — and because a BanyanDB group stores exactly one catalog, only the matching model&amp;rsquo;s panels render: a &lt;code&gt;measure&lt;/code&gt; group shows write-rate / query-latency / merge panels, a &lt;code&gt;property&lt;/code&gt; group its index-write / term-search / series panels, and so on.&lt;/li&gt;
&lt;/ul&gt;
&lt;figure style=&#34;display:flex;flex-wrap:wrap;gap:1.1rem;align-items:center;margin:1.5rem 0;&#34;&gt;
&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-02-cluster-dashboard.webp&#34; alt=&#34;The BanyanDB Cluster dashboard — write/query/error KPIs, CPU/memory/disk capacity, and the Containers by Role table.&#34; style=&#34;max-height:320px;max-width:100%;width:auto;border-radius:4px;&#34;&gt;
&lt;figcaption style=&#34;flex:1 1 240px;color:#5f6b7a;font-size:.92em;line-height:1.6;&#34;&gt;Figure 2: The Cluster scope — the whole database at a glance, with a roll-call of containers by role.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The role-gating is easiest to see by opening the &lt;em&gt;same&lt;/em&gt; Container dashboard on two different roles:&lt;/p&gt;
&lt;figure style=&#34;display:flex;flex-wrap:wrap;gap:1.1rem;align-items:center;margin:1.5rem 0;&#34;&gt;
&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-03-container-dashboard.webp&#34; alt=&#34;A Container dashboard for a data/liaison node — role-specific ingestion / storage panels on top of the shared CPU / memory / Go-runtime resources.&#34; style=&#34;max-height:320px;max-width:100%;width:auto;border-radius:4px;&#34;&gt;
&lt;figcaption style=&#34;flex:1 1 240px;color:#5f6b7a;font-size:.92em;line-height:1.6;&#34;&gt;Figure 3: A data/liaison Container — ingestion, query, storage and compaction panels on top of the shared resource panels.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure style=&#34;display:flex;flex-wrap:wrap;gap:1.1rem;align-items:center;margin:1.5rem 0;&#34;&gt;
&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-03-container-lifecycle-dashboard.webp&#34; alt=&#34;The same Container dashboard opened on the lifecycle sidecar — only its migration-cycle and last-run panels show.&#34; style=&#34;max-height:320px;max-width:100%;width:auto;border-radius:4px;&#34;&gt;
&lt;figcaption style=&#34;flex:1 1 240px;color:#5f6b7a;font-size:.92em;line-height:1.6;&#34;&gt;Figure 4: The same dashboard on the lifecycle node — just the migration panels. Same template, gated by role.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;And the &lt;strong&gt;Group&lt;/strong&gt; scope gives each storage catalog its own page:&lt;/p&gt;
&lt;figure style=&#34;display:flex;flex-wrap:wrap;gap:1.1rem;align-items:center;margin:1.5rem 0;&#34;&gt;
&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-04-group-dashboard.webp&#34; alt=&#34;The Group dashboard — per-data-model panels for one storage catalog.&#34; style=&#34;max-height:320px;max-width:100%;width:auto;border-radius:4px;&#34;&gt;
&lt;figcaption style=&#34;flex:1 1 240px;color:#5f6b7a;font-size:.92em;line-height:1.6;&#34;&gt;Figure 5: The Group scope — one storage catalog at a time, its panels gated to the group&#39;s data model.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id=&#34;edges-that-know-their-role-pair-and-a-flows-table&#34;&gt;Edges that know their role pair, and a Flows table&lt;/h3&gt;
&lt;p&gt;On the Deployment tab, the call edges between containers carry &lt;strong&gt;role-pair-specific&lt;/strong&gt; metrics off the SWIP-15 instance-relation families: a &lt;strong&gt;liaison → data&lt;/strong&gt; edge shows write / query / part-sync throughput and p99; a &lt;strong&gt;liaison → liaison&lt;/strong&gt; edge shows write-forward and control; a &lt;strong&gt;lifecycle → data&lt;/strong&gt; edge shows tier-migration volume / rate / p99. Each edge prints up to three of its pair&amp;rsquo;s metrics inline, the selected-edge panel keeps the full client-vs-server breakdown, and the &lt;strong&gt;Flows&lt;/strong&gt; sub-tab lays every edge out as one aligned table per role-pair.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/screenshots/horizon-0.7.0/p04-deployment-05-flows-table.webp&#34; alt=&#34;Figure 6: The Flows sub-tab — every container-to-container edge laid out as one aligned table per role-pair (liaison→data, lifecycle→data, …).&#34;&gt;
Figure 6: Flows — the same role-pair edges as a sortable table, one block per pair.&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;(Two preconditions. The edges and the role-specific panels assume a real &lt;strong&gt;clustered&lt;/strong&gt; BanyanDB — a single-process standalone instance shows only the shared resource and Go-runtime panels, with the rest lighting up as the cluster&amp;rsquo;s roles report. And the container-to-container edges in particular need the OAP build to expose the &lt;code&gt;SERVICE_INSTANCE_RELATION&lt;/code&gt; scope; until it does, the Deployment tab still draws the full inventory — just without the edges between pods.)&lt;/p&gt;
&lt;h2 id=&#34;configured-not-coded&#34;&gt;Configured, not coded&lt;/h2&gt;
&lt;p&gt;None of the above is a hand-built &amp;ldquo;BanyanDB screen.&amp;rdquo; The clustering rules, the per-role node metrics, and the role-pair edge metrics are all a self-contained block on the layer template, edited from the &lt;strong&gt;Layer dashboards admin → Deployment scope&lt;/strong&gt; and carried with the template&amp;rsquo;s export/import — the same config-driven model behind every other layer, which a later post covers end to end.&lt;/p&gt;
&lt;p&gt;Next up: &lt;strong&gt;the 3D Infrastructure Map&lt;/strong&gt; — where this same deployment, and every other layer, lifts off the page into a WebGL view of your whole estate.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Monitoring Envoy AI Gateway with Apache SkyWalking</title>
      <link>/blog/2026-04-02-envoy-ai-gateway-monitoring/</link>
      <pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-04-02-envoy-ai-gateway-monitoring/</guid>
      <description>
        
        
        &lt;h2 id=&#34;the-problem-flying-blind-with-llm-traffic&#34;&gt;The Problem: Flying Blind with LLM Traffic&lt;/h2&gt;
&lt;p&gt;LLM traffic is becoming a first-class citizen in production infrastructure. Teams are calling OpenAI, Anthropic,
AWS Bedrock, Azure OpenAI, Google Gemini — often multiple providers at once. But most organizations have
no unified visibility into this traffic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Token costs spiral&lt;/strong&gt; without knowing which teams, models, or providers drive the spend.
A single misconfigured prompt template can burn through thousands of dollars before anyone notices.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provider outages cause cascading failures.&lt;/strong&gt; When OpenAI has a bad hour, your application goes down
with it — and you have no failover visibility to understand what happened or switch providers automatically.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No unified metrics&lt;/strong&gt; across heterogeneous LLM calls. Latency, Time to First Token (TTFT),
Time Per Output Token (TPOT), token usage, error rates — each provider reports these differently,
if at all. There is no single dashboard to compare them.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the same observability gap that microservices faced a decade ago. The solution then was
service meshes and API gateways with built-in telemetry. For AI workloads, the answer is an AI gateway.&lt;/p&gt;
&lt;h2 id=&#34;why-an-ai-gateway&#34;&gt;Why an AI Gateway&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://aigateway.envoyproxy.io/&#34;&gt;Envoy AI Gateway&lt;/a&gt; is an open-source AI gateway built on top of
&lt;a href=&#34;https://www.envoyproxy.io/&#34;&gt;Envoy Proxy&lt;/a&gt; and &lt;a href=&#34;https://gateway.envoyproxy.io/&#34;&gt;Envoy Gateway&lt;/a&gt;.
It is not a standalone SaaS product or a Python proxy — it is infrastructure-grade software built on
the same Envoy that already handles traffic for a large portion of cloud-native deployments.&lt;/p&gt;
&lt;p&gt;Key capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multi-provider routing&lt;/strong&gt; — supports 16+ AI providers (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI,
Google Gemini, Mistral, Cohere, DeepSeek, and more) behind a unified API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Token-based rate limiting&lt;/strong&gt; — rate limit by token consumption, not just request count.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provider fallback&lt;/strong&gt; — automatic failover when a provider is down or slow.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model virtualization&lt;/strong&gt; — abstract model names so applications are decoupled from specific providers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Two-tier architecture&lt;/strong&gt; — a reference architecture with a centralized entry gateway (Tier 1) for
auth and global routing, and per-cluster gateways (Tier 2) for inference optimization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CNCF ecosystem native&lt;/strong&gt; — runs on Kubernetes, composes with existing Envoy filters, WASM plugins,
and standard Kubernetes Gateway API resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because Envoy AI Gateway natively emits GenAI metrics and access logs via OTLP following
&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI Semantic Conventions&lt;/a&gt;,
it plugs directly into any OpenTelemetry-compatible backend.&lt;/p&gt;
&lt;p&gt;Starting from SkyWalking 10.4.0, the OAP server natively receives and analyzes Envoy AI Gateway&amp;rsquo;s
OTLP metrics and access logs — no OpenTelemetry Collector needed in between.&lt;/p&gt;
&lt;h2 id=&#34;data-flow&#34;&gt;Data Flow&lt;/h2&gt;
&lt;p&gt;The AI Gateway pushes telemetry directly to SkyWalking via OTLP gRPC:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;workflow.jpg&#34; alt=&#34;Data flow&#34;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Application&lt;/strong&gt; sends LLM API requests through the Envoy AI Gateway.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Envoy AI Gateway&lt;/strong&gt; routes requests to AI providers (or local models like Ollama)
and records GenAI metrics (token usage, latency, TTFT, TPOT) and access logs.&lt;/li&gt;
&lt;li&gt;The gateway pushes metrics and logs via &lt;strong&gt;OTLP gRPC&lt;/strong&gt; directly to &lt;strong&gt;SkyWalking OAP&lt;/strong&gt; on port 11800.&lt;/li&gt;
&lt;li&gt;SkyWalking OAP parses metrics with MAL rules and access logs with LAL rules,
then stores everything in &lt;strong&gt;BanyanDB&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;No OpenTelemetry Collector is needed. SkyWalking OAP&amp;rsquo;s built-in OTLP receiver handles everything.&lt;/p&gt;
&lt;h2 id=&#34;try-it-locally&#34;&gt;Try It Locally&lt;/h2&gt;
&lt;p&gt;This demo uses &lt;a href=&#34;https://ollama.com/&#34;&gt;Ollama&lt;/a&gt; as a local LLM backend so you can try
everything without an API key. The &lt;a href=&#34;https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw&#34;&gt;Envoy AI Gateway CLI&lt;/a&gt;
(&lt;code&gt;aigw&lt;/code&gt;) provides a standalone mode that runs outside Kubernetes — perfect for local testing.&lt;/p&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Docker and Docker Compose&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ollama.com/&#34;&gt;Ollama&lt;/a&gt; installed on your host&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;step-1-start-ollama&#34;&gt;Step 1: Start Ollama&lt;/h3&gt;
&lt;p&gt;Start Ollama on all interfaces so Docker containers can reach it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#953800&#34;&gt;OLLAMA_HOST&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;0.0.0.0 ollama serve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pull a small model for testing:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ollama pull llama3.2:1b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;step-2-start-the-stack&#34;&gt;Step 2: Start the Stack&lt;/h3&gt;
&lt;p&gt;Create a &lt;code&gt;docker-compose.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;services&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-banyandb:0.10.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17912:17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;wget -qO- http://localhost:17913/api/healthz || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;3s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-oap-server:10.4.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;oap&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;11800:11800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;12800:12800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE_BANYANDB_TARGETS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb:17912&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;bash -c &amp;#39;echo &amp;gt; /dev/tcp/localhost/12800&amp;#39; || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;start_period&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;60s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ui&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-ui:10.4.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ui&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;8080:8080&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_OAP_ADDRESS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://oap:12800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;aigw&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;envoyproxy/ai-gateway-cli:latest&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;aigw&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OPENAI_BASE_URL=http://host.docker.internal:11434/v1&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OPENAI_API_KEY=unused&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_SERVICE_NAME=my-ai-gateway&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_EXPORTER_OTLP_ENDPOINT=http://oap:11800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_EXPORTER_OTLP_PROTOCOL=grpc&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_METRICS_EXPORTER=otlp&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_LOGS_EXPORTER=otlp&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_METRIC_EXPORT_INTERVAL=5000&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=aigw-1,service.layer=ENVOY_AI_GATEWAY&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;1975:1975&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;extra_hosts&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;host.docker.internal:host-gateway&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;run&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Start everything:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Wait for all services to become healthy (BanyanDB starts first, then OAP, then UI and AI Gateway):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose ps
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key OTLP configuration on the &lt;code&gt;aigw&lt;/code&gt; service:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Env Var&lt;/th&gt;
          &lt;th&gt;Value&lt;/th&gt;
          &lt;th&gt;Purpose&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_SERVICE_NAME&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;my-ai-gateway&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Service name in SkyWalking&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;http://oap:11800&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;SkyWalking OAP gRPC endpoint&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;grpc&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;OTLP transport&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_METRICS_EXPORTER&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;otlp&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Enable metrics push&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_LOGS_EXPORTER&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;otlp&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Enable access log push&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;OTEL_RESOURCE_ATTRIBUTES&lt;/code&gt; must include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;job_name=envoy-ai-gateway&lt;/code&gt; — routing tag for MAL/LAL rules&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service.instance.id=&amp;lt;id&amp;gt;&lt;/code&gt; — instance identity&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service.layer=ENVOY_AI_GATEWAY&lt;/code&gt; — routes logs to AI Gateway LAL rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The MAL and LAL rules are enabled by default in SkyWalking OAP. No OAP-side configuration is needed.&lt;/p&gt;
&lt;h3 id=&#34;step-3-run-the-demo-app&#34;&gt;Step 3: Run the Demo App&lt;/h3&gt;
&lt;p&gt;Create a simple Python application that sends requests through the AI Gateway (&lt;code&gt;app.py&lt;/code&gt;).
It mixes normal requests, streaming requests (for TTFT/TPOT metrics), and error requests
(non-existent model → HTTP 404, always captured by the LAL sampling policy):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;time&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;random&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;GATEWAY &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;http://localhost:1975&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;HEADERS &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Bearer unused&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Content-Type&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;application/json&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;questions &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What is Apache SkyWalking? Answer in one sentence.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What is Envoy Proxy used for? Answer in one sentence.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What are the benefits of an AI gateway? Answer in two sentences.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Explain observability in three sentences.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;chat&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;model&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; question&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;False&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    resp &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; requests&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;post&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;GATEWAY&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;/v1/chat/completions&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        json&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; model&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;messages&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; question&lt;span style=&#34;color:#1f2328&#34;&gt;}],&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;stream&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; stream&lt;span style=&#34;color:#1f2328&#34;&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        headers&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;HEADERS&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; timeout&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;60&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;stream&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; stream&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        chunks &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;iter_lines&lt;span style=&#34;color:#1f2328&#34;&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; line&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                chunks&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;append&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;line&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;decode&lt;span style=&#34;color:#1f2328&#34;&gt;())&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;status_code&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[streamed &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;len&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;chunks&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; chunks]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;status_code&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;json&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    r &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;random&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; r &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.2&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Error request: non-existent model triggers 404&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; body &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;non-existent-model&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hello&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[error] model=non-existent-model status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;elif&lt;/span&gt; r &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.5&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Streaming request — generates TTFT and TPOT metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        q &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;choice&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;questions&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; info &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;llama3.2:1b&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; q&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[stream] status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;info&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;else&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Normal non-streaming request&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        q &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;choice&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;questions&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; body &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;llama3.2:1b&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; q&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        answer &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; body&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;choices&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[{}])[&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;message&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{})&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)[:&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        tokens &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; body&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;usage&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[ok] status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; tokens=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;tokens&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; answer=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;answer&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;...&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    time&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;sleep&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;randint&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Run it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install requests
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;python app.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The application talks to the AI Gateway on port 1975, which routes to Ollama.
Each request generates GenAI metrics (token usage, latency, TTFT, TPOT) and access logs
that the gateway pushes to SkyWalking via OTLP.&lt;/p&gt;
&lt;p&gt;The error requests (non-existent model → HTTP 404) are always captured by the access log
sampling policy, so you will see them in the SkyWalking log view.&lt;/p&gt;
&lt;h3 id=&#34;step-4-view-in-skywalking-ui&#34;&gt;Step 4: View in SkyWalking UI&lt;/h3&gt;
&lt;p&gt;Open &lt;a href=&#34;http://localhost:8080&#34;&gt;http://localhost:8080&lt;/a&gt; and select the &lt;strong&gt;GenAI &amp;gt; Envoy AI Gateway&lt;/strong&gt; menu.&lt;/p&gt;
&lt;p&gt;The service list shows &lt;code&gt;my-ai-gateway&lt;/code&gt; with CPM, latency, and token rates at a glance:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-1.png&#34; alt=&#34;Service list&#34;&gt;&lt;/p&gt;
&lt;p&gt;Click into the service to see the full dashboard — Request CPM, Latency (average + percentiles),
Input/Output Token Rates, TTFT, and TPOT:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-2.png&#34; alt=&#34;Service dashboard&#34;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Providers&lt;/strong&gt; tab breaks down metrics by AI provider:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-3.png&#34; alt=&#34;Provider breakdown&#34;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Models&lt;/strong&gt; tab shows per-model metrics including TTFT and TPOT (streaming only).
Note the &lt;code&gt;unknown&lt;/code&gt; model entries — these are the error requests with non-existent models:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-4.png&#34; alt=&#34;Model breakdown&#34;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Log&lt;/strong&gt; tab shows access logs. The sampling policy drops normal successful responses
but always captures errors (HTTP 404) and high-token requests:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-5.png&#34; alt=&#34;Access logs&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;cleanup&#34;&gt;Cleanup&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose down
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;deploying-on-kubernetes&#34;&gt;Deploying on Kubernetes&lt;/h2&gt;
&lt;p&gt;For production deployments, Envoy AI Gateway runs as a full Kubernetes controller with
Envoy Gateway as the control plane. See the
&lt;a href=&#34;https://aigateway.envoyproxy.io/docs/getting-started/&#34;&gt;Envoy AI Gateway getting started guide&lt;/a&gt;
for Kubernetes installation.&lt;/p&gt;
&lt;p&gt;The OTLP configuration is the same — set the &lt;code&gt;OTEL_*&lt;/code&gt; environment variables on the
AI Gateway&amp;rsquo;s external processor to point at SkyWalking OAP&amp;rsquo;s gRPC port (11800).
See the &lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/&#34;&gt;SkyWalking Envoy AI Gateway Monitoring&lt;/a&gt;
documentation for details.&lt;/p&gt;
&lt;h2 id=&#34;genai-observability-without-an-ai-gateway&#34;&gt;GenAI Observability Without an AI Gateway&lt;/h2&gt;
&lt;p&gt;Not every deployment uses an AI gateway. If your applications call LLM providers directly,
SkyWalking 10.4.0 also provides GenAI observability through the
&lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/&#34;&gt;Virtual GenAI&lt;/a&gt; layer.&lt;/p&gt;
&lt;p&gt;This works with any SkyWalking-instrumented, OpenTelemetry-instrumented, or Zipkin-instrumented application.
When traces carry &lt;code&gt;gen_ai.*&lt;/code&gt; tags (following
&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI Semantic Conventions&lt;/a&gt;),
SkyWalking derives per-provider and per-model metrics from the client side:
latency, token usage, success rate, and estimated cost.&lt;/p&gt;
&lt;p&gt;For Java applications, the SkyWalking Java Agent (9.7+) includes a Spring AI plugin that automatically
instruments calls to 13+ providers (OpenAI, Anthropic, AWS Bedrock, Google GenAI, DeepSeek, Mistral, etc.)
with the correct &lt;code&gt;gen_ai.*&lt;/code&gt; span tags — no code changes needed.&lt;/p&gt;
&lt;p&gt;This is a different use case from the Envoy AI Gateway monitoring covered above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Envoy AI Gateway layer&lt;/strong&gt;: infrastructure-level observability — what the gateway sees across all traffic.
Best for platform teams managing centralized AI routing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Virtual GenAI layer&lt;/strong&gt;: application-level observability — what each instrumented app sees for its own LLM calls.
Best for teams without a centralized gateway, or for per-application cost tracking.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aigateway.envoyproxy.io/&#34;&gt;Envoy AI Gateway&lt;/a&gt; — project site and documentation&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw&#34;&gt;Envoy AI Gateway CLI&lt;/a&gt; — standalone mode for local development&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/&#34;&gt;SkyWalking Envoy AI Gateway Monitoring&lt;/a&gt; — OAP setup doc&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/&#34;&gt;SkyWalking Virtual GenAI&lt;/a&gt; — client-side GenAI observability&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI Semantic Conventions&lt;/a&gt; — the metric/attribute standard both projects follow&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Zh: 用 Apache SkyWalking 监控 Envoy AI Gateway</title>
      <link>/zh/2026-04-02-envoy-ai-gateway-monitoring/</link>
      <pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate>
      <guid>/zh/2026-04-02-envoy-ai-gateway-monitoring/</guid>
      <description>
        
        
        &lt;h2 id=&#34;问题llm-流量缺乏统一观测&#34;&gt;问题：LLM 流量缺乏统一观测&lt;/h2&gt;
&lt;p&gt;LLM 流量正在成为生产基础设施中不可忽视的一部分。团队同时在调用 OpenAI、Anthropic、AWS Bedrock、Azure OpenAI、Google Gemini——往往还不止一个提供商。但大多数组织对这些流量缺乏统一的可见性：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Token 费用失控&lt;/strong&gt;，却不知道哪个团队、哪个模型、哪个提供商在烧钱。一个配置不当的 prompt 模板就可能在无人察觉的情况下烧掉几千美元。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;提供商故障引发连锁反应。&lt;/strong&gt; OpenAI 出问题的那一小时，你的应用也跟着挂——而你既没有故障切换的可见性，也无法自动切换提供商。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;缺乏统一指标。&lt;/strong&gt; 延迟、首 Token 耗时（TTFT）、每 Token 输出耗时（TPOT）、Token 用量、错误率——每个提供商的报告方式都不一样，有些甚至不提供。没有一个统一的面板能做对比。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这和十年前微服务面临的可观测性困境如出一辙。当时的解法是服务网格和内置遥测的 API 网关。对 AI 工作负载来说，答案就是 AI 网关。&lt;/p&gt;
&lt;h2 id=&#34;为什么选择-ai-网关&#34;&gt;为什么选择 AI 网关&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://aigateway.envoyproxy.io/&#34;&gt;Envoy AI Gateway&lt;/a&gt; 是一个开源 AI 网关，构建在 &lt;a href=&#34;https://www.envoyproxy.io/&#34;&gt;Envoy Proxy&lt;/a&gt; 和 &lt;a href=&#34;https://gateway.envoyproxy.io/&#34;&gt;Envoy Gateway&lt;/a&gt; 之上。底层就是云原生世界里已经广泛部署的 Envoy，天然具备基础设施级的稳定性和性能。&lt;/p&gt;
&lt;p&gt;核心能力：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;多提供商路由&lt;/strong&gt; —— 支持 16+ AI 提供商（OpenAI、Anthropic、AWS Bedrock、Azure OpenAI、Google Gemini、Mistral、Cohere、DeepSeek 等），统一 API 接入。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;基于 Token 的限流&lt;/strong&gt; —— 按 Token 消耗限流，而不只是按请求数。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;提供商故障切换&lt;/strong&gt; —— 某个提供商宕机或响应慢时自动切换。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;模型虚拟化&lt;/strong&gt; —— 抽象模型名称，让应用与具体提供商解耦。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;两层架构&lt;/strong&gt; —— 参考架构包含一个集中入口网关（Tier 1）负责认证和全局路由，以及每集群网关（Tier 2）负责推理优化。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CNCF 生态原生&lt;/strong&gt; —— 运行在 Kubernetes 上，兼容现有的 Envoy Filter、WASM 插件和标准 Kubernetes Gateway API 资源。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Envoy AI Gateway 原生支持通过 OTLP 发送 GenAI 指标和访问日志，遵循 &lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI 语义约定&lt;/a&gt;，可以直接接入任何兼容 OpenTelemetry 的后端。&lt;/p&gt;
&lt;p&gt;从 SkyWalking 10.4.0 开始，OAP 原生接收和分析 Envoy AI Gateway 的 OTLP 指标和访问日志——中间不需要部署 OpenTelemetry Collector。&lt;/p&gt;
&lt;h2 id=&#34;数据流&#34;&gt;数据流&lt;/h2&gt;
&lt;p&gt;AI Gateway 通过 OTLP gRPC 直接将遥测数据推送到 SkyWalking：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;workflow.jpg&#34; alt=&#34;数据流&#34;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;应用&lt;/strong&gt; 通过 Envoy AI Gateway 发送 LLM API 请求。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Envoy AI Gateway&lt;/strong&gt; 将请求路由到 AI 提供商（或 Ollama 这样的本地模型），同时记录 GenAI 指标（Token 用量、延迟、TTFT、TPOT）和访问日志。&lt;/li&gt;
&lt;li&gt;网关通过 &lt;strong&gt;OTLP gRPC&lt;/strong&gt; 直接将指标和日志推送到 &lt;strong&gt;SkyWalking OAP&lt;/strong&gt; 的 11800 端口。&lt;/li&gt;
&lt;li&gt;SkyWalking OAP 用 MAL 规则解析指标、用 LAL 规则解析访问日志，然后统一存储到 &lt;strong&gt;BanyanDB&lt;/strong&gt;。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;不需要 OpenTelemetry Collector。SkyWalking OAP 内置的 OTLP 接收器可以直接处理所有数据。&lt;/p&gt;
&lt;h2 id=&#34;本地体验&#34;&gt;本地体验&lt;/h2&gt;
&lt;p&gt;这个 Demo 使用 &lt;a href=&#34;https://ollama.com/&#34;&gt;Ollama&lt;/a&gt; 作为本地 LLM 后端，不需要任何 API Key 就能跑起来。&lt;a href=&#34;https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw&#34;&gt;Envoy AI Gateway CLI&lt;/a&gt;（&lt;code&gt;aigw&lt;/code&gt;）提供独立运行模式，不依赖 Kubernetes，非常适合本地测试。&lt;/p&gt;
&lt;h3 id=&#34;前置条件&#34;&gt;前置条件&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Docker 和 Docker Compose&lt;/li&gt;
&lt;li&gt;主机上已安装 &lt;a href=&#34;https://ollama.com/&#34;&gt;Ollama&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;第一步启动-ollama&#34;&gt;第一步：启动 Ollama&lt;/h3&gt;
&lt;p&gt;让 Ollama 监听所有网络接口，以便 Docker 容器能访问到：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#953800&#34;&gt;OLLAMA_HOST&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;0.0.0.0 ollama serve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;拉取一个小模型用于测试：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ollama pull llama3.2:1b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;第二步启动服务栈&#34;&gt;第二步：启动服务栈&lt;/h3&gt;
&lt;p&gt;创建 &lt;code&gt;docker-compose.yaml&lt;/code&gt;：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;services&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-banyandb:0.10.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17912:17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;wget -qO- http://localhost:17913/api/healthz || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;3s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-oap-server:10.4.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;oap&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;11800:11800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;12800:12800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE_BANYANDB_TARGETS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb:17912&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;bash -c &amp;#39;echo &amp;gt; /dev/tcp/localhost/12800&amp;#39; || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;start_period&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;60s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ui&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-ui:10.4.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ui&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;8080:8080&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_OAP_ADDRESS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://oap:12800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;aigw&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;envoyproxy/ai-gateway-cli:latest&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;aigw&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OPENAI_BASE_URL=http://host.docker.internal:11434/v1&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OPENAI_API_KEY=unused&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_SERVICE_NAME=my-ai-gateway&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_EXPORTER_OTLP_ENDPOINT=http://oap:11800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_EXPORTER_OTLP_PROTOCOL=grpc&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_METRICS_EXPORTER=otlp&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_LOGS_EXPORTER=otlp&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_METRIC_EXPORT_INTERVAL=5000&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=aigw-1,service.layer=ENVOY_AI_GATEWAY&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;1975:1975&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;extra_hosts&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;host.docker.internal:host-gateway&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;run&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;启动所有服务：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;等待所有服务变为健康状态（BanyanDB 先启动，然后是 OAP，最后是 UI 和 AI Gateway）：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose ps
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;aigw&lt;/code&gt; 服务的关键 OTLP 配置：&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;环境变量&lt;/th&gt;
          &lt;th&gt;值&lt;/th&gt;
          &lt;th&gt;用途&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_SERVICE_NAME&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;my-ai-gateway&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;SkyWalking 中的服务名&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;http://oap:11800&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;SkyWalking OAP gRPC 端点&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;grpc&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;OTLP 传输协议&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_METRICS_EXPORTER&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;otlp&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;启用指标推送&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;OTEL_LOGS_EXPORTER&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;otlp&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;启用访问日志推送&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;OTEL_RESOURCE_ATTRIBUTES&lt;/code&gt; 必须包含：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;job_name=envoy-ai-gateway&lt;/code&gt; —— MAL/LAL 规则的路由标签&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service.instance.id=&amp;lt;id&amp;gt;&lt;/code&gt; —— 实例标识&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service.layer=ENVOY_AI_GATEWAY&lt;/code&gt; —— 将日志路由到 AI Gateway LAL 规则&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MAL 和 LAL 规则在 SkyWalking OAP 中默认启用，不需要额外配置。&lt;/p&gt;
&lt;h3 id=&#34;第三步运行-demo-应用&#34;&gt;第三步：运行 Demo 应用&lt;/h3&gt;
&lt;p&gt;创建一个简单的 Python 应用，通过 AI Gateway 发送请求（&lt;code&gt;app.py&lt;/code&gt;）。
它混合了普通请求、流式请求（用于产生 TTFT/TPOT 指标）和错误请求（不存在的模型 → HTTP 404，始终会被 LAL 采样策略捕获）：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;time&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;random&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#24292e&#34;&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;GATEWAY &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;http://localhost:1975&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;HEADERS &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Bearer unused&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Content-Type&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;application/json&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;questions &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What is Apache SkyWalking? Answer in one sentence.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What is Envoy Proxy used for? Answer in one sentence.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;What are the benefits of an AI gateway? Answer in two sentences.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Explain observability in three sentences.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;chat&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;model&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; question&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;False&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    resp &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; requests&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;post&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;GATEWAY&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;/v1/chat/completions&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        json&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;model&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; model&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;messages&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[{&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; question&lt;span style=&#34;color:#1f2328&#34;&gt;}],&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;stream&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt; stream&lt;span style=&#34;color:#1f2328&#34;&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        headers&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;HEADERS&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; timeout&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;60&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;stream&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; stream&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        chunks &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;iter_lines&lt;span style=&#34;color:#1f2328&#34;&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; line&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                chunks&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;append&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;line&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;decode&lt;span style=&#34;color:#1f2328&#34;&gt;())&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;status_code&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[streamed &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;len&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;chunks&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; chunks]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;status_code&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; resp&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;json&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    r &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;random&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; r &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.2&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Error request: non-existent model triggers 404&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; body &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;non-existent-model&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hello&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[error] model=non-existent-model status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;elif&lt;/span&gt; r &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.5&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Streaming request — generates TTFT and TPOT metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        q &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;choice&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;questions&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; info &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;llama3.2:1b&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; q&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; stream&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[stream] status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;info&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;else&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#57606a&#34;&gt;# Normal non-streaming request&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        q &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;choice&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;questions&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        status&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; body &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; chat&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;llama3.2:1b&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; q&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        answer &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; body&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;choices&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[{}])[&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;message&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{})&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)[:&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        tokens &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; body&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;usage&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[ok] status=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;status&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; tokens=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;tokens&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; answer=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;answer&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;...&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    time&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;sleep&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;random&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;randint&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;运行：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install requests
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;python app.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;应用通过 1975 端口与 AI Gateway 通信，AI Gateway 再路由到 Ollama。每次请求都会产生 GenAI 指标（Token 用量、延迟、TTFT、TPOT）和访问日志，由网关通过 OTLP 推送到 SkyWalking。&lt;/p&gt;
&lt;p&gt;错误请求（不存在的模型 → HTTP 404）始终会被访问日志采样策略捕获，所以在 SkyWalking 的日志视图中一定能看到。&lt;/p&gt;
&lt;h3 id=&#34;第四步在-skywalking-ui-中查看&#34;&gt;第四步：在 SkyWalking UI 中查看&lt;/h3&gt;
&lt;p&gt;打开 &lt;a href=&#34;http://localhost:8080&#34;&gt;http://localhost:8080&lt;/a&gt;，选择 &lt;strong&gt;GenAI &amp;gt; Envoy AI Gateway&lt;/strong&gt; 菜单。&lt;/p&gt;
&lt;p&gt;服务列表显示 &lt;code&gt;my-ai-gateway&lt;/code&gt;，可以一览 CPM、延迟和 Token 速率：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-1.png&#34; alt=&#34;服务列表&#34;&gt;&lt;/p&gt;
&lt;p&gt;点击进入服务详情，查看完整仪表盘——请求 CPM、延迟（平均值 + 百分位数）、输入/输出 Token 速率、TTFT 和 TPOT：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-2.png&#34; alt=&#34;服务仪表盘&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Providers&lt;/strong&gt; 标签页按 AI 提供商维度展示指标：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-3.png&#34; alt=&#34;提供商维度&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Models&lt;/strong&gt; 标签页展示每个模型的指标，包括 TTFT 和 TPOT（仅流式请求）。注意 &lt;code&gt;unknown&lt;/code&gt; 模型条目——这些就是使用不存在模型的错误请求：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-4.png&#34; alt=&#34;模型维度&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Log&lt;/strong&gt; 标签页展示访问日志。采样策略会丢弃正常的成功响应，但始终保留错误（HTTP 404）和高 Token 消耗的请求：&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screen-5.png&#34; alt=&#34;访问日志&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;清理&#34;&gt;清理&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose down
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;kubernetes-生产部署&#34;&gt;Kubernetes 生产部署&lt;/h2&gt;
&lt;p&gt;生产环境中，Envoy AI Gateway 作为完整的 Kubernetes 控制器运行，以 Envoy Gateway 作为控制面。详见 &lt;a href=&#34;https://aigateway.envoyproxy.io/docs/getting-started/&#34;&gt;Envoy AI Gateway 入门指南&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;OTLP 配置方式相同——在 AI Gateway 的 External Processor 上设置 &lt;code&gt;OTEL_*&lt;/code&gt; 环境变量，指向 SkyWalking OAP 的 gRPC 端口（11800）。详见 &lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/&#34;&gt;SkyWalking Envoy AI Gateway 监控文档&lt;/a&gt;。&lt;/p&gt;
&lt;h2 id=&#34;不用-ai-网关也能做-genai-可观测&#34;&gt;不用 AI 网关也能做 GenAI 可观测&lt;/h2&gt;
&lt;p&gt;并非所有场景都需要 AI 网关。如果你的应用直接调用 LLM 提供商，SkyWalking 10.4.0 也提供了基于 &lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/&#34;&gt;Virtual GenAI&lt;/a&gt; 层的 GenAI 可观测方案。&lt;/p&gt;
&lt;p&gt;任何接入了 SkyWalking、OpenTelemetry 或 Zipkin 探针的应用都能使用这个功能。只要 Trace 中携带 &lt;code&gt;gen_ai.*&lt;/code&gt; 标签（遵循 &lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI 语义约定&lt;/a&gt;），SkyWalking 就能从客户端视角推导出每提供商、每模型的指标：延迟、Token 用量、成功率和预估费用。&lt;/p&gt;
&lt;p&gt;对于 Java 应用，SkyWalking Java Agent（9.7+）内置了 Spring AI 插件，自动为 13+ 提供商（OpenAI、Anthropic、AWS Bedrock、Google GenAI、DeepSeek、Mistral 等）的调用注入正确的 &lt;code&gt;gen_ai.*&lt;/code&gt; Span 标签——不需要改代码。&lt;/p&gt;
&lt;p&gt;这与上面介绍的 Envoy AI Gateway 监控是不同的使用场景：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Envoy AI Gateway 层&lt;/strong&gt;：基础设施级可观测——网关视角，覆盖所有流量。适合负责集中 AI 路由的平台团队。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Virtual GenAI 层&lt;/strong&gt;：应用级可观测——每个应用自己看到的 LLM 调用情况。适合没有集中网关的团队，或者需要按应用维度跟踪费用的场景。&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;参考资料&#34;&gt;参考资料&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aigateway.envoyproxy.io/&#34;&gt;Envoy AI Gateway&lt;/a&gt; —— 项目官网和文档&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/envoyproxy/ai-gateway/tree/main/cmd/aigw&#34;&gt;Envoy AI Gateway CLI&lt;/a&gt; —— 本地开发用的独立运行模式&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/&#34;&gt;SkyWalking Envoy AI Gateway 监控&lt;/a&gt; —— OAP 配置文档&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/&#34;&gt;SkyWalking Virtual GenAI&lt;/a&gt; —— 客户端侧 GenAI 可观测&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/gen-ai/&#34;&gt;OpenTelemetry GenAI 语义约定&lt;/a&gt; —— 两个项目共同遵循的指标/属性标准&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Zh: AI Coding 如何重塑软件架构师的工作方式</title>
      <link>/zh/2026-03-13-how-ai-changed-the-economics-of-architecture/</link>
      <pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate>
      <guid>/zh/2026-03-13-how-ai-changed-the-economics-of-architecture/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;以 SkyWalking GraalVM Distro 为例，看 AI Coding 如何把一批探索性 PoC 打磨成一条可重复的迁移流水线。&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;./graph.jpg&#34; alt=&#34;graph.jpg&#34;&gt;&lt;/p&gt;
&lt;p&gt;这个项目给我最大的启发，不是 AI 能写多少代码，而是 AI Coding 改变了架构设计的试错成本。当一个想法可以很快做成 PoC、跑起来验证、不行就推翻重来时，架构师就更有机会逼近自己真正想要的设计，而不是过早停在“团队现在做得出来”的折中方案上。&lt;/p&gt;
&lt;p&gt;这种变化在成熟开源系统里尤其重要。Apache SkyWalking OAP 长期以来一直是一个功能强大且经过生产验证的可观测性后端，但大型 Java 平台该有的问题它一个不少：运行时字节码生成、重反射初始化、classpath 扫描、基于 SPI 的模块装配，以及动态 DSL 执行——这些机制方便扩展，但做 GraalVM Native Image 时全是障碍。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SkyWalking GraalVM Distro&lt;/strong&gt; 的出现，源于我们把这个挑战当成一个架构设计问题来处理，而不是一次性的移植工程。目标不仅是让 OAP 能以原生二进制运行，更是把 GraalVM 迁移本身做成一条可重复执行、能够持续跟上上游演进的自动化流水线。&lt;/p&gt;
&lt;p&gt;如果你想看完整的技术设计、基准数据和上手方式，请阅读配套文章：&lt;a href=&#34;/zh/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/&#34;&gt;SkyWalking GraalVM Distro：设计与基准测试&lt;/a&gt;。&lt;/p&gt;
&lt;h2 id=&#34;从停滞的想法到可运行的系统&#34;&gt;从停滞的想法到可运行的系统&lt;/h2&gt;
&lt;p&gt;这件事其实很多年前就开始了。在这个仓库创建不久之后，&lt;a href=&#34;https://github.com/yswdqz&#34;&gt;yswdqz&lt;/a&gt; 曾花了数个月探索迁移方案。真正做下来才发现，这个项目远比 GraalVM 文档里列出的那些单点限制复杂得多，这项工作最终也因此搁置了很多年。&lt;/p&gt;
&lt;p&gt;这段停滞很重要。缺少的并不是想法。成熟维护者通常从来不缺想法，真正稀缺的，是把这些想法真正做出来的时间、人力和精力。即使架构师已经看到了几条很有前景的路线，有限的开发资源也会迫使大家更早做出权衡：优先选择实现成本最低的方案，而不是那个更干净、更可复用、更经得起未来变化的方案。&lt;/p&gt;
&lt;p&gt;这种情况非常普遍，并不特殊。在开源社区里，很多工作依赖志愿者或有限的企业赞助；在商业产品里，约束的形式不同，但本质仍然一样：路线图承诺、团队规模和交付压力都会让工程资源始终紧张。在这两种环境里，很多好想法被放弃，并不是因为它们错了，而是因为要把它们真正验证清楚、实现完整，成本太高。&lt;/p&gt;
&lt;p&gt;还有一个同样重要的约束：架构师通常同时也是非常资深的工程师，而不是一个可以全职扑在实现细节上的人。问题在于个人编码精力有限、时间高度碎片化，同时还要在代码尚未出现之前，不断向其他资深工程师解释自己的设计意图。传统上，这种解释主要通过图、文档和沟通完成。它很慢、信息损失大，而且充满不确定性。我们都体验过“传话游戏”：哪怕是很简单的意思，也很容易被误解，而等误解真正暴露出来时，时间已经过去很多了。&lt;/p&gt;
&lt;p&gt;到了 2025 年末，AI Coding 让”同时尝试多条路线”这件事终于变得现实。我们不必再因为实现能力稀缺而过早接受折中，而是可以在多个设计之间来回切换，用代码验证，快速淘汰弱方案，持续迭代，直到架构本身变得足够稳固、足够实用、足够高效。&lt;/p&gt;
&lt;p&gt;这种设计自由度至关重要。GraalVM 文档对单个限制讲得很清楚，但成熟 OSS 平台遇到的是一整套彼此牵连的系统性问题。只修补一个动态机制远远不够。要让 native image 真正落地，我们必须把整类运行时行为改造成构建期产物和自动生成的元数据。&lt;/p&gt;
&lt;p&gt;在这条路的早期历史中，还有一座非常具体的大山。那时上游 SkyWalking 仍然大量依赖 Groovy 来处理 LAL、MAL 和 Hierarchy 脚本。理论上，这只不过是另一个“不支持运行时动态行为”的例子；但在实践中，Groovy 是整条路径上最大的障碍。它不仅意味着脚本执行，还意味着一整套在 JVM 里极其便利、在 native image 里却极其不友好的动态模型。&lt;/p&gt;
&lt;p&gt;为了跨过这道坎，我们围绕 AOT-first 模式重新设计了 OAP 的核心引擎。早期实验必须直接面对 Groovy 时代的运行时行为，并尝试不同的脚本编译方案来绕过去。最终方案走得更远：对齐上游编译器流水线，把动态生成前移到构建期，并引入自动化机制，让这条迁移路径在上游持续演进时依然保持可控。具体来说，就是把 OAL、MAL、LAL 和 Hierarchy 的生成过程变成构建期预编译器的输出，而不是继续保留为启动期的动态行为。&lt;/p&gt;
&lt;h2 id=&#34;ai-coding-如何改写架构迭代&#34;&gt;AI Coding 如何改写架构迭代&lt;/h2&gt;
&lt;p&gt;这次转变的关键，并不只是“写代码更快了”。AI 真正改变的，是想法、原型、验证和重设计之间来回迭代的速度。围绕同一个问题，我们可以很快做出几个可运行的 PoC，迅速淘汰不成立的方向，再把值得保留的抽象慢慢沉淀成一套连贯的迁移系统。&lt;/p&gt;
&lt;p&gt;这并不会削弱人的架构价值，反而会放大它。哪些行为应该前移到构建期，哪些地方应该保留可配置性，哪里应该引入 same-FQCN 替换，如何让上游同步保持可控，以及哪些抽象值得不惜代价保留下来，这些判断仍然只能由人来做。不同的是，AI 的速度让我们终于有机会把这些更好的设计真正做出来，而不是过早退回到更简单、也更差的折中方案。&lt;/p&gt;
&lt;p&gt;这才是软件架构师工作方式真正发生变化的地方。过去，架构师往往已经知道更干净的方向在哪里，但有限的工程产能会逼着那个愿景退回到一个更便宜的妥协方案。现在，架构师在某种意义上又重新变回了“能快速动手的人”：可以直接用代码把思路搭出来，把高层抽象落成接口，再用真实运行的实现去证明设计。&lt;/p&gt;
&lt;p&gt;这不仅改变了实现，也改变了沟通方式。在开源里，我们常说：&lt;code&gt;talk is cheap, show me the code&lt;/code&gt;。在 AI Coding 时代，“把代码拿出来”这件事变得容易多了。设计不再那么依赖一个缓慢的、自上而下的翻译过程：从想法到文档，再到解释，再到实现。代码可以更早出现，也可以更早跑起来。&lt;/p&gt;
&lt;p&gt;这也让其他资深工程师受益。他们不必只靠图、会议或长篇解释来还原整个设计，而是可以直接审查抽象、阅读真实代码、运行它、质疑它，并在具体实现上一起打磨。这让架构协作更快、更清晰，也少了很多沟通误差。&lt;/p&gt;
&lt;p&gt;也正因为如此，我总觉得今天很多 AI 讨论有点跑偏。很多项目确实很有趣、也很好玩，拿来体验当然没问题，但高级工程工作并不会因为“给代码库接了个 agent”就自然变好。真正重要的，不是哪个 demo 看起来最炫，而是哪些工程能力真的被放大了，同时软件开发本身的纪律有没有被保留下来。&lt;/p&gt;
&lt;p&gt;对于架构师和资深工程师来说，这里真正重要的能力包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;快速做对比式原型验证&lt;/strong&gt;：不是只用 slides 和文档去论证某个想法，而是直接把多个方案做成可运行代码来比较。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;大规模代码理解能力&lt;/strong&gt;：能在大量模块之间快速阅读，同时保持对整个系统的全局认识。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;系统性的重构能力&lt;/strong&gt;：把基于反射、依赖运行时动态行为的路径，系统性地改造成适配 AOT 约束的设计。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;搭建自动化的能力&lt;/strong&gt;：当一个迁移步骤在每次上游同步时都必须重做一次，靠手工处理本身就很费时费力，而且越往后只会越累。AI 让我们真正有条件去投资生成器、清单、一致性检查和漂移检测，把重复的人力劳动变成可重复的自动化流程。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;大范围审查能力&lt;/strong&gt;：在很大的代码面上检查边界条件、兼容性约束，以及方案是否经得起反复执行。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这些能力也都体现在最终的设计结果里。same-FQCN 替换为 GraalVM 特定行为建立了清晰、受控的边界；反射元数据不再依赖手工维护的猜测清单，而是直接从构建产物中生成；各种清单机制和漂移检测，则把原本模糊的“上游同步风险”变成了显式的工程工作流。&lt;/p&gt;
&lt;p&gt;对于初级工程师，我觉得这里的启发同样重要。AI 不会让架构设计、系统约束、接口设计、测试和可维护性这些基本功变得不重要。恰恰相反，这些能力只会变得更重要，因为它们决定了“被加速的实现”最终产出的是一个可持续演进的系统，还是只是更快地制造出更多代码。真正的杠杆来自工程判断力，而不是新鲜感。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; 和 &lt;strong&gt;Gemini AI&lt;/strong&gt; 在整个过程中都扮演了工程加速器的角色。在 GraalVM Distro 这个项目里，它们具体帮我们做了几件事：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;把迁移思路直接做成可运行代码&lt;/strong&gt;：不是争论哪个方向可能行得通，而是把多个真实原型做出来、跑起来、比较掉，把不成立的方向淘汰掉。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;重构重反射、重动态的代码路径&lt;/strong&gt;：把不适合运行时的模式系统性替换成 AOT 友好的实现方式。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;让上游同步真正可持续&lt;/strong&gt;：每次 distro 从上游 SkyWalking 拉取变更后，元数据扫描、配置再生成和重新编译都必须再来一次。AI 帮助我们把这些过程做成流水线，使每次同步都变成一个可控、且大部分自动化的过程，而不是一次比一次更长的手工重复劳动。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;在大范围内审查逻辑和边界情况&lt;/strong&gt;：特别是在功能对等性比纯实现速度更重要的地方。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;最终产出的，不只是一次大重写，而是一套可重复的系统：预编译器、manifest 驱动的加载、反射配置生成、替换边界，以及让上游迁移可审查、可自动化的漂移检测机制。&lt;/p&gt;
&lt;p&gt;如果你想看这种开发方法背后的更广泛背景，可以读这篇文章：&lt;a href=&#34;/zh/2026-03-08-agentic-vibe-coding/&#34;&gt;在成熟开源大型项目中实践 Agentic Vibe Coding：软件工程与工程控制论还在延续&lt;/a&gt;。这篇文章则是这个故事的下一步：不仅是在一个成熟代码库里增强功能，而是重新激活一项曾经停滞的工作，并把它真正做成可运行系统。&lt;/p&gt;
&lt;h2 id=&#34;真正改变的到底是什么&#34;&gt;真正改变的到底是什么&lt;/h2&gt;
&lt;p&gt;这个项目最重要的结果，并不是一张 benchmark 表。基准数据当然属于 distro 本身，而且它们很重要，因为它们证明这套系统是真实可运行的。但对这篇文章来说，更深层的变化发生在方法论层面：AI Coding 改变了我们探索、验证和打磨架构方案的方式。&lt;/p&gt;
&lt;p&gt;过去，架构往往更像一项以文档为主、后面拖着漫长而昂贵实现过程的活动。现在，我们可以更快地在想法、原型、比较和重设计之间切换。这让我们真正有机会去追求更高抽象层次的方案，保留更干净的边界，并建设那些让迁移过程可持续维护的自动化机制。&lt;/p&gt;
&lt;p&gt;这项工作的技术证据，就是 SkyWalking GraalVM Distro 本身：它不仅是一个可运行的系统，更是一条由预编译器、自动生成的反射元数据、受控替换边界和漂移检查组成的迁移流水线。基准数据之所以重要，是因为它们证明这套系统在实践里是成立的；但从架构角度看，真正的结果是：这次迁移不再是一场一次性的移植，而是变成了一套可重复执行的系统工程。关于完整测试方法、原始数据和技术设计，请阅读配套文章：&lt;a href=&#34;/zh/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/&#34;&gt;SkyWalking GraalVM Distro：设计与基准测试&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;项目仓库位于 &lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro&#34;&gt;apache/skywalking-graalvm-distro&lt;/a&gt;。我们欢迎社区成员测试这个新发行版、提交 issue，并帮助它逐步走向生产可用。&lt;/p&gt;
&lt;p&gt;对我来说，更深层的启发并不止于这个发行版。AI Coding 不会让架构变得不重要，反而会让架构更值得被认真追求。当实现速度提升到一定程度时，我们终于有机会在真实代码里验证更多想法，保留那些真正好的抽象，并把那些过去常常因为投入太大而半途妥协的系统真正做出来。&lt;/p&gt;
&lt;p&gt;对于资深工程师来说，瓶颈正在从单纯的代码实现速度，转向品味、系统判断力，以及定义稳定边界的能力。对于初级工程师来说，真正该走的路不是追逐每一种看上去都很刺激的 AI 工作流，而是把基础能力练得更扎实，让加速真正产生复利：理解需求、阅读陌生系统、质疑假设，并识别出在系统快速变化时仍然必须保持正确的那些部分。AI Coding 降低了验证好设计的代价，但并没有降低工程判断本身的门槛。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Zh: SkyWalking GraalVM Distro：设计与基准测试</title>
      <link>/zh/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/</link>
      <pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate>
      <guid>/zh/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;这篇文章会完整介绍我们如何把 Apache SkyWalking OAP 迁移到 GraalVM Native Image。目标不是做一次性移植，而是把这件事做成一套能持续跟上上游演进的流程。&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;./graph.jpg&#34; alt=&#34;graph.jpg&#34;&gt;&lt;/p&gt;
&lt;p&gt;如果你想看这项工作的更大背景，以及 AI Coding 如何让这个项目真正做得出来，请阅读：&lt;a href=&#34;/zh/2026-03-13-how-ai-changed-the-economics-of-architecture/&#34;&gt;AI Coding 如何重塑软件架构师的工作方式&lt;/a&gt;。&lt;/p&gt;
&lt;h2 id=&#34;为什么-graalvm-在这里是刚需&#34;&gt;为什么 GraalVM 在这里是刚需&lt;/h2&gt;
&lt;p&gt;GraalVM Native Image 可以把 Java 应用做 Ahead-of-Time（AOT）编译，生成独立可执行文件。对于像 SkyWalking OAP 这样的可观测性后端来说，这不是“锦上添花”的性能优化，而是明确的工程刚需。&lt;/p&gt;
&lt;p&gt;可观测性平台必须是基础设施中最可靠的部分。它必须在自己要观测的那些故障发生时依然存活。在云原生环境里，工作负载会不断扩缩容、迁移和重启，负责观测一切的后端本身不能还是那个启动慢、空闲占用大、恢复缓慢的重型进程。&lt;/p&gt;
&lt;p&gt;我们的基准测试结果让这个结论变得非常具体：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;**启动时间：**约 5 ms 对比约 635 ms。在 Kubernetes 集群里，当 OAP Pod 被驱逐或重新调度时，635 ms 的差距意味着这段时间里的遥测数据可能会丢失。5 ms 的情况下，新 Pod 往往在大部分客户端还没感知到中断之前就已经重新开始接收数据了。&lt;/li&gt;
&lt;li&gt;**空闲内存：**约 41 MiB 对比约 1.2 GiB。可观测性后端是 24/7 常驻运行的。在多租户或边缘部署场景里，基础 RSS 降了 97%，可以放进更小的节点，而不再必须占用一台专用机器。&lt;/li&gt;
&lt;li&gt;**负载下内存：**在 20 RPS 下约 629 MiB 对比约 2.0 GiB。生产级负载下内存降了 70%，直接对应更少的节点、更低的云账单，以及在后端本身成为扩容瓶颈之前更多的余量。&lt;/li&gt;
&lt;li&gt;**没有预热惩罚：**峰值吞吐可以更早发挥出来。JVM 的 JIT 编译器往往需要数分钟流量才能完成热点优化，在这段时间里，尾延迟更差，数据处理也会滞后。原生二进制没有同样的阶段。&lt;/li&gt;
&lt;li&gt;**更小的攻击面：**不再需要完整 JDK 运行时，需要跟踪和修补的 CVE 也就少了很多。对于一个会接收整个集群所有服务数据的组件来说，这一点很重要。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这些都不是“小修小补”。它们直接改变了哪些部署形态开始变得可行：无服务器形态的可观测性后端、边车式采集模型、内存预算极其紧张的边缘节点。只有当后端足够轻、足够快时，这些方案才真正有落地空间。&lt;/p&gt;
&lt;h2 id=&#34;挑战一个成熟动态特性很多的-java-平台&#34;&gt;挑战：一个成熟、动态特性很多的 Java 平台&lt;/h2&gt;
&lt;p&gt;SkyWalking OAP 身上有大型 Java 平台的所有典型问题：运行时字节码生成、重反射初始化、classpath 扫描、基于 SPI 的模块装配，以及动态 DSL 执行。这些机制方便扩展，但做 GraalVM native image 时全是障碍。&lt;/p&gt;
&lt;p&gt;GraalVM 文档中列出的限制，只是问题的开始。在一个成熟的 OSS 平台里，这些限制会深深缠绕在多年积累下来的运行时设计决策中。常规的 GraalVM native image 很难处理运行时类生成、反射、动态发现和脚本执行，而这些在 SkyWalking OAP 中都不是零散存在的，它们本来就是系统设计的一部分。&lt;/p&gt;
&lt;p&gt;在这个发行版的早期历史里，还有一座非常具体的大山。那时上游 SkyWalking 仍然高度依赖 Groovy 来处理 LAL、MAL 和 Hierarchy 脚本。理论上，它只是另一个“不支持运行时动态”的组件；但在实践里，Groovy 是整条路径上最大的障碍。它不仅仅是脚本执行问题，而是代表着一整套在 JVM 世界里极其便利、在 native image 世界里极其不友好的动态模型。&lt;/p&gt;
&lt;h2 id=&#34;设计目标让迁移这件事可以重复做&#34;&gt;设计目标：让迁移这件事可以重复做&lt;/h2&gt;
&lt;p&gt;设计目标不是”把 native-image 跑通一次就完”，而是做出一套能反复用、能长期维护的迁移系统：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;把运行时生成的产物前移到构建期。&lt;/strong&gt; OAL、MAL、LAL、Hierarchy 规则，以及 meter 相关的生成类，都在构建期完成编译并打包，而不是等到启动时才动态生成。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;用确定性的加载机制替代动态发现。&lt;/strong&gt; classpath 扫描和运行时注册路径被转换为基于 manifest 的加载方式。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;减少运行时反射，并在构建期生成 native 元数据。&lt;/strong&gt; 反射配置不再依赖人工维护的猜测清单，而是根据真实 manifest 和扫描结果生成。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;让上游同步边界保持清晰。&lt;/strong&gt; same-FQCN replacements 会被显式打包、列清单，并通过陈旧性检查守住边界。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;让变化第一时间暴露出来。&lt;/strong&gt; 一旦上游 provider、规则文件或被替换的源文件发生变化，测试就会失败，迫使我们做显式审查。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;这才是最关键的架构转变。好的抽象和前瞻性，在 AI 时代并没有变得不重要，反而变得更重要了，因为它们决定了 AI 带来的速度，最终产出的是一个可维护的系统，还是一堆膨胀得更快的代码。&lt;/p&gt;
&lt;h2 id=&#34;把运行时动态行为变成构建期产物&#34;&gt;把运行时动态行为变成构建期产物&lt;/h2&gt;
&lt;p&gt;SkyWalking OAP 里有多个在 JVM 世界里很自然、但在 native image 里很棘手的动态子系统：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OAL 会在运行时生成类。&lt;/li&gt;
&lt;li&gt;LAL、MAL 和 Hierarchy 在历史上与大量基于 Groovy 的运行时行为绑定在一起，这也是早期 distro 工作中最难处理的阻碍之一。&lt;/li&gt;
&lt;li&gt;MAL、LAL 和 Hierarchy 规则依赖运行时编译行为。&lt;/li&gt;
&lt;li&gt;基于 Guava 的 classpath 扫描会发现注解、dispatcher、decorator 和 meter function。&lt;/li&gt;
&lt;li&gt;基于 SPI 的模块和 provider 发现依赖更动态的运行时环境。&lt;/li&gt;
&lt;li&gt;YAML/config 初始化和框架集成依赖反射访问。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;在 SkyWalking GraalVM Distro 里，这些问题不是靠零散补丁一个个修掉的，而是被统一收敛到一条构建期流水线里。&lt;/p&gt;
&lt;p&gt;预编译器会在构建过程中运行 DSL 引擎、导出生成类、写入 manifest、序列化配置数据，并生成 native-image 元数据。这样一来，启动时只需要做类加载和注册，不再需要运行时代码生成。运行期之所以能变得更简单，是因为原本的复杂性被前移到了构建期。&lt;/p&gt;
&lt;p&gt;这也是为什么这个项目不只是一次性能优化。我们的设计目标，是把复杂性前移到一个更容易验证、更容易自动化、也更便于反复执行的位置。&lt;/p&gt;
&lt;h2 id=&#34;same-fqcn-替换一条可控的边界&#34;&gt;same-FQCN 替换：一条可控的边界&lt;/h2&gt;
&lt;p&gt;这个发行版里最实用的设计选择之一，就是使用 same-FQCN 替换类。我们没有依赖模糊的启动技巧，也没有依赖未文档化的加载顺序假设。相反，我们会重新打包 GraalVM 特定 jar，排除原本的上游类，再让替换类占据完全相同的 fully-qualified class name。&lt;/p&gt;
&lt;p&gt;这对可维护性非常关键，因为它建立了一条非常清晰的边界：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;上游类仍然定义行为契约；&lt;/li&gt;
&lt;li&gt;GraalVM 侧的替换类提供兼容的实现策略；&lt;/li&gt;
&lt;li&gt;打包过程则让这次替换变得显式可见。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;例如，OAL 的加载过程从运行时编译变成了基于 manifest 的预编译类加载。类似的替换也处理了 MAL 和 LAL DSL 加载、模块装配、配置初始化，以及多个对反射敏感的路径。目标不是把一切都 fork 出去，而是只替换那些运行时模型从根本上不适合 native image 的部分。&lt;/p&gt;
&lt;p&gt;随后，这条边界还会通过测试来守护：测试会对照与 replacement 对应的上游源文件做哈希。当上游改动了这些文件中的任何一个，构建就会失败，并明确告诉我们哪个 replacement 需要重新审查。这样一来，“如何跟上上游”就不再是一个充满焦虑的抽象问题，而变成一项明确、可落地的工程工作。&lt;/p&gt;
&lt;h2 id=&#34;反射配置不是猜出来的而是生成出来的&#34;&gt;反射配置不是猜出来的，而是生成出来的&lt;/h2&gt;
&lt;p&gt;在很多 GraalVM 迁移项目里，&lt;code&gt;reflect-config.json&lt;/code&gt; 最终会变成一个靠经验不断累积的工件。它会越来越大，越来越陈旧，最后没有人真正清楚它是不是完整，也不清楚每一项配置为什么存在。这种模式在一个持续演化的大型 OSS 平台里是无法扩展的。&lt;/p&gt;
&lt;p&gt;在这个发行版里，反射元数据直接从构建产物和扫描结果中生成，包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OAL、MAL、LAL、Hierarchy 以及 meter 生成类的 manifest；&lt;/li&gt;
&lt;li&gt;注解扫描得到的类；&lt;/li&gt;
&lt;li&gt;Armeria HTTP handler；&lt;/li&gt;
&lt;li&gt;GraphQL resolver 和 schema 映射类型；&lt;/li&gt;
&lt;li&gt;被接受的 &lt;code&gt;ModuleConfig&lt;/code&gt; 类。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这是一种健康得多的模式。我们不再依赖人去记住所有可能触发反射访问的路径，而是让系统根据真实迁移流水线推导出反射元数据。构建过程本身，成为了事实来源。&lt;/p&gt;
&lt;h2 id=&#34;让上游同步变得现实可行&#34;&gt;让上游同步变得现实可行&lt;/h2&gt;
&lt;p&gt;如果这个发行版只是一次性的工程冲刺，那它的意义会小很多。真正困难的事情，是在上游 SkyWalking 继续演进的同时，让它还能持续维护下去。&lt;/p&gt;
&lt;p&gt;这也是为什么仓库里会有一整套显式的清单和漂移检测机制：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;provider 清单，用来强制新上游 provider 被分类；&lt;/li&gt;
&lt;li&gt;规则文件清单，用来强制新 DSL 输入被显式确认；&lt;/li&gt;
&lt;li&gt;预编译 YAML 输入的 SHA watcher；&lt;/li&gt;
&lt;li&gt;带 GraalVM 特定 replacement 的上游源文件 SHA watcher。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;好的抽象不仅仅是代码结构优雅，更在于你是否选择了一种能在未来变化面前继续成立的迁移设计。&lt;/p&gt;
&lt;h2 id=&#34;基准测试结果&#34;&gt;基准测试结果&lt;/h2&gt;
&lt;p&gt;我们在一台 Apple M3 Max（macOS、Docker Desktop、10 CPUs / 62.7 GB）上，对标准 JVM OAP 和 GraalVM Distro 做了对比测试，两者都连接到 BanyanDB。&lt;/p&gt;
&lt;h3 id=&#34;启动测试docker-compose无流量3-次取中位数&#34;&gt;启动测试（Docker Compose，无流量，3 次取中位数）&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;指标&lt;/th&gt;
          &lt;th&gt;JVM OAP&lt;/th&gt;
          &lt;th&gt;GraalVM OAP&lt;/th&gt;
          &lt;th&gt;差异&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;冷启动时间&lt;/td&gt;
          &lt;td&gt;635 ms&lt;/td&gt;
          &lt;td&gt;5 ms&lt;/td&gt;
          &lt;td&gt;约快 127 倍&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;热启动时间&lt;/td&gt;
          &lt;td&gt;630 ms&lt;/td&gt;
          &lt;td&gt;5 ms&lt;/td&gt;
          &lt;td&gt;约快 126 倍&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;空闲 RSS&lt;/td&gt;
          &lt;td&gt;约 1.2 GiB&lt;/td&gt;
          &lt;td&gt;约 41 MiB&lt;/td&gt;
          &lt;td&gt;约降低 97%&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;启动时间的测量方式，是从 OAP 第一条应用日志时间戳开始，到出现 &lt;code&gt;listening on 11800&lt;/code&gt; 日志（即 gRPC 服务 ready）为止。&lt;/p&gt;
&lt;h3 id=&#34;持续负载下kind--istio-1252--bookinfo约-20-rps2-个-oap-副本&#34;&gt;持续负载下（Kind + Istio 1.25.2 + Bookinfo，约 20 RPS，2 个 OAP 副本）&lt;/h3&gt;
&lt;p&gt;在 60 秒预热之后，每 10 秒采样一次，共 30 个样本。&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;指标&lt;/th&gt;
          &lt;th&gt;JVM OAP&lt;/th&gt;
          &lt;th&gt;GraalVM OAP&lt;/th&gt;
          &lt;th&gt;差异&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;CPU 中位数（millicores）&lt;/td&gt;
          &lt;td&gt;101&lt;/td&gt;
          &lt;td&gt;68&lt;/td&gt;
          &lt;td&gt;-33%&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CPU 平均值（millicores）&lt;/td&gt;
          &lt;td&gt;107&lt;/td&gt;
          &lt;td&gt;67&lt;/td&gt;
          &lt;td&gt;-37%&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;内存中位数（MiB）&lt;/td&gt;
          &lt;td&gt;2068&lt;/td&gt;
          &lt;td&gt;629&lt;/td&gt;
          &lt;td&gt;&lt;strong&gt;-70%&lt;/strong&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;内存平均值（MiB）&lt;/td&gt;
          &lt;td&gt;2082&lt;/td&gt;
          &lt;td&gt;624&lt;/td&gt;
          &lt;td&gt;&lt;strong&gt;-70%&lt;/strong&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;两个版本报告的 entry-service CPM 一致，说明在这个测试负载下，两者的流量处理能力相同。&lt;/p&gt;
&lt;p&gt;我们每 30 秒通过 swctl 对所有已发现服务收集这些指标：
&lt;code&gt;service_cpm&lt;/code&gt;、&lt;code&gt;service_resp_time&lt;/code&gt;、&lt;code&gt;service_sla&lt;/code&gt;、&lt;code&gt;service_apdex&lt;/code&gt;、&lt;code&gt;service_percentile&lt;/code&gt;。&lt;/p&gt;
&lt;p&gt;完整的基准测试脚本和原始数据位于发行版仓库中的 &lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro/tree/main/benchmark&#34;&gt;benchmark/&lt;/a&gt; 目录。&lt;/p&gt;
&lt;h2 id=&#34;当前状态&#34;&gt;当前状态&lt;/h2&gt;
&lt;p&gt;这个项目已经是一个可运行的实验性发行版，托管在独立仓库中：&lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro&#34;&gt;apache/skywalking-graalvm-distro&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;当前发行版有意聚焦在一种现代、高性能的运行模式上：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;存储：&lt;/strong&gt; BanyanDB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;集群模式：&lt;/strong&gt; Standalone 和 Kubernetes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;配置方式：&lt;/strong&gt; 无配置或 Kubernetes ConfigMap&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;运行模型：&lt;/strong&gt; 固定模块集合、预编译产物和 AOT 友好的装配方式&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这种聚焦是刻意的。要把迁移做成一套可重复的系统，第一步必须先把边界收清楚，做出一个真正能跑起来的版本，然后再在不失控的前提下逐步扩展。&lt;/p&gt;
&lt;h2 id=&#34;快速开始&#34;&gt;快速开始&lt;/h2&gt;
&lt;p&gt;由于 SkyWalking GraalVM Distro 的设计目标就是追求极致性能，它目前最适合与 &lt;strong&gt;BanyanDB&lt;/strong&gt; 存储后端搭配使用。当前发布的镜像已经可以在 Docker Hub 获取，你可以直接用下面这个 &lt;code&gt;docker-compose.yml&lt;/code&gt; 启动整套系统。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;version&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;3.8&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;services&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ghcr.io/apache/skywalking-banyandb:e1ba421bd624727760c7a69c84c6fe55878fb526&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17912:17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17913:17913&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data --measure-metadata-cache-wait-duration 1m --stream-metadata-cache-wait-duration 1m&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;nc -nz 127.0.0.1 17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-graalvm-distro:0.1.1&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;oap&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;11800:11800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;12800:12800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE_BANYANDB_TARGETS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb:17912&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_HEALTH_CHECKER&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;default&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;nc -nz 127.0.0.1 11800 || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ui&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ghcr.io/apache/skywalking/ui:10.3.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ui&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;8080:8080&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_OAP_ADDRESS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://oap:12800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;只需要执行：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;欢迎社区来测试这个新发行版、提交 issue，并帮助我们推动它走向生产可用。&lt;/p&gt;
&lt;p&gt;&lt;em&gt;特别感谢 GraalVM 团队提供的技术基础。&lt;/em&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: How AI Changed the Economics of Architecture</title>
      <link>/blog/2026-03-13-how-ai-changed-the-economics-of-architecture/</link>
      <pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-03-13-how-ai-changed-the-economics-of-architecture/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;SkyWalking GraalVM Distro: A case study in turning runnable PoCs into a repeatable migration pipeline.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;./graph.jpg&#34; alt=&#34;graph.jpg&#34;&gt;&lt;/p&gt;
&lt;p&gt;The most important lesson from this project is not that AI can generate a large amount of code. It is that AI changes the economics of architecture. When runnable PoCs become cheap to build, compare, discard, and rebuild, architects can push further toward the design they actually want instead of stopping early at a compromise they can afford to implement.&lt;/p&gt;
&lt;p&gt;That shift matters a lot in mature open source systems. Apache SkyWalking OAP has long been a powerful and production-proven observability backend, but it also carries all the realities of a large Java platform: runtime bytecode generation, reflection-heavy initialization, classpath scanning, SPI-based module wiring, and dynamic DSL execution that are friendly to extensibility but hostile to GraalVM native image.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SkyWalking GraalVM Distro&lt;/strong&gt; is the result of treating that challenge as a design-system problem instead of a one-off porting exercise. The goal was not only to make OAP run as a native binary, but to turn GraalVM migration itself into a repeatable automation pipeline that can stay aligned with upstream evolution.&lt;/p&gt;
&lt;p&gt;For the full technical design, benchmark data, and getting-started guide, see the companion post: &lt;a href=&#34;../2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/index.md&#34;&gt;SkyWalking GraalVM Distro: Design and Benchmarks&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;from-paused-idea-to-runnable-system&#34;&gt;From Paused Idea to Runnable System&lt;/h2&gt;
&lt;p&gt;This journey actually began years ago. Shortly after this repository was created, &lt;a href=&#34;https://github.com/yswdqz&#34;&gt;yswdqz&lt;/a&gt; spent several months exploring the transition. The project proved much harder in practice than the individual GraalVM limitations sounded on paper, and the work eventually paused for years.&lt;/p&gt;
&lt;p&gt;That pause is important. The missing ingredient was not ideas. Mature maintainers usually have more ideas than time. The real constraint was implementation economics. Even when the architect can see several promising directions, limited developer resources force an earlier trade-off: choose the path that is cheapest to implement, not necessarily the path that is cleanest, most reusable, or most future-proof.&lt;/p&gt;
&lt;p&gt;This is a very common reality, not an exceptional one. In open source communities, much of the work depends on volunteers or limited company sponsorship. In commercial products, the pressure is different but the constraint is still real: roadmap commitments, staffing limits, and delivery deadlines keep engineering resources tight. In both worlds, good ideas are often abandoned not because they are wrong, but because they are too expensive to validate and implement thoroughly.&lt;/p&gt;
&lt;p&gt;There is another constraint that matters just as much: the architect is usually also a very senior engineer, not a full-time implementation machine. That means limited personal coding energy, fragmented time, and a constant need to explain ideas to other senior engineers before the code exists. Traditionally, that explanation happens through diagrams, documents, and conversations. It is slow, lossy, and unpredictable. We all know some version of the Telephone Game: even simple words are easy to misunderstand, and by the time the misunderstanding becomes visible, a lot of time has already passed.&lt;/p&gt;
&lt;p&gt;What changed in late 2025 was that AI engineering made multiple runnable ideas affordable. Instead of picking an early compromise because implementation capacity was scarce, we could switch repeatedly between designs, validate them with code, discard weak directions quickly, and keep iterating until the architecture became solid, practical, and efficient enough to hold.&lt;/p&gt;
&lt;p&gt;That design freedom was critical. GraalVM documentation gives clear guidance on isolated limitations, but a mature OSS platform hits them as a connected system. Fixing only one dynamic mechanism is not enough. To make native image practical, we had to turn whole categories of runtime behavior into build-time artifacts and automated metadata generation.&lt;/p&gt;
&lt;p&gt;There was also a very concrete mountain in front of us in the early history of this distro. In the first several commits of the repository, upstream SkyWalking still relied heavily on Groovy for LAL, MAL, and Hierarchy scripts. In theory, that was just one more unsupported runtime-heavy component. In practice, Groovy was the biggest obstacle in the whole path. It represented not only script execution, but a whole dynamic model that was deeply convenient on the JVM side and deeply unfriendly to native image.&lt;/p&gt;
&lt;p&gt;To bridge the gap, we re-architected the core engines of OAP around an AOT-first model. Earlier experiments had to confront Groovy-era runtime behavior directly and explore alternative script-compilation approaches to get around it. The finalized direction went further: align with the upstream compiler pipeline, move dynamic generation to build time, and add automation so the migration stays controllable as upstream keeps moving. Concretely, that meant turning OAL, MAL, LAL, and Hierarchy generation into build-time precompiler outputs instead of leaving them as startup-time dynamic behavior.&lt;/p&gt;
&lt;h2 id=&#34;ai-speed-changed-the-design-loop&#34;&gt;AI Speed Changed the Design Loop&lt;/h2&gt;
&lt;p&gt;The scale of this transformation was not only about coding faster. AI changed the loop between idea, prototype, validation, and redesign. We could build runnable PoCs for different approaches, throw away weak ones quickly, and preserve the promising abstractions until they formed a coherent migration system.&lt;/p&gt;
&lt;p&gt;That does not reduce the role of human architecture. It raises the value of it. Human judgment was still required to decide what should become build-time, what should stay configurable, where to introduce same-FQCN replacements, how to keep upstream sync controllable, and which abstractions were worth preserving. But AI speed made it realistic to pursue those better designs instead of settling for a simpler compromise too early.&lt;/p&gt;
&lt;p&gt;This is the real change in the economics of architecture. In the past, an architect might already know the cleaner direction, but limited engineering capacity often forced that vision back toward a cheaper compromise. Now the architect can return much closer to being a fast developer again: building code, shaping high-abstraction interfaces, and using design patterns to prove the vision directly in the real world.&lt;/p&gt;
&lt;p&gt;That changes communication as much as implementation. In open source, we often say, &lt;code&gt;talk is cheap, show me the code&lt;/code&gt;. With AI engineering, showing the code becomes much more straightforward. The design no longer depends so heavily on a slow top-down translation from idea to documents to interpretation to implementation. The code can appear earlier, and it can run earlier.&lt;/p&gt;
&lt;p&gt;Other senior engineers benefit from this too. They do not need to reconstruct the whole design only from diagrams, meetings, or long explanations. They can review the actual abstraction, see the behavior in code, run it, challenge it, and refine it from something concrete. That makes architectural collaboration faster, clearer, and less lossy.&lt;/p&gt;
&lt;p&gt;This is also where I think the current AI discussion is often noisy. Many projects are fun, surprising, and worth exploring, but advanced engineering work is not improved merely by attaching an agent to a codebase. The important question is not which demo looks most magical. The important question is which engineering capabilities are actually being accelerated without losing the discipline of software development itself.&lt;/p&gt;
&lt;p&gt;For architects and senior engineers, the capabilities that mattered most here were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fast comparative prototyping:&lt;/strong&gt; Building several runnable approaches in code instead of defending one idea with slides and documents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large-scale code comprehension:&lt;/strong&gt; Reading across many modules quickly enough to keep the whole system in view.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Systematic refactoring:&lt;/strong&gt; Converting reflection-heavy or runtime-dynamic paths into designs that fit AOT constraints.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automation construction:&lt;/strong&gt; When a migration step must be repeated every upstream sync, doing it manually once is already expensive. Doing it manually again next time is even more expensive. AI made it practical to invest in generators, inventories, consistency checks, and drift detectors that turn repeated manual work into repeatable automation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Review at breadth:&lt;/strong&gt; Checking edge cases, compatibility boundaries, and repeatability across a large surface area.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those capabilities were visible in the resulting design. Same-FQCN replacements created a controlled boundary for GraalVM-specific behavior. Reflection metadata was generated from build outputs instead of maintained as a hand-written guess list. Inventories and drift detectors turned upstream sync from a vague maintenance risk into an explicit engineering workflow.&lt;/p&gt;
&lt;p&gt;For junior engineers, I think the lesson is equally important. AI does not remove the need to learn architecture, invariants, interfaces, testing, or maintenance. It makes those skills more valuable, because they determine whether accelerated implementation produces a durable system or just more code faster. The leverage comes from engineering judgment, not from novelty.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; and &lt;strong&gt;Gemini AI&lt;/strong&gt; acted as engineering accelerators throughout this process. In the GraalVM Distro specifically, they helped us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Explore migration strategies as running code:&lt;/strong&gt; Instead of debating which approach might work, we built and compared multiple real prototypes, discarded the weak ones, and kept what held up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Refactor reflection-heavy and dynamic code paths:&lt;/strong&gt; Replace runtime-hostile patterns with AOT-friendly alternatives across the codebase.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Make upstream sync sustainable:&lt;/strong&gt; Every time the distro pulls from upstream SkyWalking, metadata scanning, config regeneration, and recompilation must happen again. AI helped build the pipeline so that each sync is a controlled, largely automated process rather than a fresh manual effort that grows longer each time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Review logic and edge cases at scale:&lt;/strong&gt; Especially in places where feature parity mattered more than raw implementation speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result was not just a large rewrite. It was a repeatable system: precompilers, manifest-driven loading, reflection-config generation, replacement boundaries, and drift detectors that make upstream migration reviewable and automatable.&lt;/p&gt;
&lt;p&gt;For the broader methodology behind this style of development, see &lt;a href=&#34;https://builder.aws.com/content/3AgtzlikuD9bSUJrWDCjGW5Q5nW/agentic-vibe-coding-in-a-mature-oss-project-what-worked-what-didnt&#34;&gt;Agentic Vibe Coding in a Mature OSS Project&lt;/a&gt;. This post is the next step in that story: not only enhancing an active mature codebase, but reviving a paused effort and making it actually runnable.&lt;/p&gt;
&lt;h2 id=&#34;what-actually-changed&#34;&gt;What Actually Changed&lt;/h2&gt;
&lt;p&gt;The most important outcome of this project is not a benchmark table. The benchmark results belong to the distro itself, and they matter because they prove the system is real. But for this post, the deeper result is methodological: AI engineering changed how architecture could be explored, validated, and refined.&lt;/p&gt;
&lt;p&gt;Instead of treating architecture as a mostly document-driven activity followed by a long and expensive implementation phase, we were able to move much faster between idea, prototype, comparison, and redesign. That made it realistic to pursue higher-abstraction solutions, preserve cleaner boundaries, and build the automation needed to keep the migration maintainable over time.&lt;/p&gt;
&lt;p&gt;The technical evidence for that work is the SkyWalking GraalVM Distro itself: not only a runnable system, but a migration pipeline expressed as precompilers, generated reflection metadata, controlled replacement boundaries, and drift checks. The benchmark data matter because they prove the system works in practice, but the architectural result is that the migration became a repeatable system rather than a one-time port. For detailed benchmark methodology, per-pod data, and the full technical design, see &lt;a href=&#34;https://skywalking.apache.org/blog/2026-03-13-how-ai-changed-the-economics-of-architecture/&#34;&gt;SkyWalking GraalVM Distro: Design and Benchmarks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The project is hosted at &lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro&#34;&gt;apache/skywalking-graalvm-distro&lt;/a&gt;. We invite the community to test it, report issues, and help move it toward production readiness.&lt;/p&gt;
&lt;p&gt;For me, the deeper takeaway is broader than this distro. AI engineering does not make architecture less important. It makes architecture more worth pursuing. When implementation speed rises enough, we can afford to test more ideas in code, keep the good abstractions, and build systems that would previously have been judged too expensive to finish well.&lt;/p&gt;
&lt;p&gt;For senior engineers, that means the bottleneck shifts away from raw typing speed and toward taste, system judgment, and the ability to define stable boundaries. For junior engineers, it means the path forward is not to chase every exciting AI workflow, but to become stronger at the fundamentals that let acceleration compound: understanding requirements, reading unfamiliar systems, questioning assumptions, and recognizing what must remain correct as everything around it changes. AI changed the economics of architecture because it lowered the cost of validating better designs without lowering the bar for engineering judgment.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: SkyWalking GraalVM Distro: Design and Benchmarks</title>
      <link>/blog/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/</link>
      <pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate>
      <guid>/blog/2026-03-13-skywalking-graalvm-distro-design-and-benchmarks/</guid>
      <description>
        
        
        &lt;p&gt;&lt;em&gt;A technical deep-dive into how we migrated Apache SkyWalking OAP to GraalVM Native Image — not as a one-off port, but as a repeatable pipeline that stays aligned with upstream.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;./graph.jpg&#34; alt=&#34;graph.jpg&#34;&gt;&lt;/p&gt;
&lt;p&gt;For the broader story of how AI engineering made this project economically viable, see &lt;a href=&#34;/blog/2026-03-13-how-ai-changed-the-economics-of-architecture/&#34;&gt;How AI Changed the Economics of Architecture&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-graalvm-is-not-optional&#34;&gt;Why GraalVM Is Not Optional&lt;/h2&gt;
&lt;p&gt;GraalVM Native Image compiles Java applications Ahead-of-Time (AOT) into standalone executables. For an observability backend like SkyWalking OAP, this is not a performance optimization — it is an operational necessity.&lt;/p&gt;
&lt;p&gt;An observability platform must be the most reliable component in the infrastructure. It has to survive the failures it is supposed to observe. In cloud-native environments where workloads scale, migrate, and restart constantly, the backend that watches everything cannot itself be the slow, heavy process that takes seconds to recover and gigabytes to idle.&lt;/p&gt;
&lt;p&gt;Our benchmarks make the case concrete:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Startup:&lt;/strong&gt; ~5 ms vs ~635 ms. In a Kubernetes cluster where an OAP pod gets evicted or rescheduled, a 635 ms gap means lost telemetry — traces, metrics, and logs that arrive during that window are simply dropped. At 5 ms, the new pod is receiving data before most clients even notice the disruption.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Idle memory:&lt;/strong&gt; ~41 MiB vs ~1.2 GiB. Observability backends run 24/7. In a multi-tenant or edge deployment, a 97% reduction in baseline RSS is the difference between fitting the observability stack on a small node and needing a dedicated one.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory under load:&lt;/strong&gt; ~629 MiB vs ~2.0 GiB at 20 RPS. A 70% reduction at production-like traffic means fewer nodes, lower cloud bills, and more headroom before the backend itself becomes a scaling bottleneck.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No warm-up penalty:&lt;/strong&gt; Peak throughput is available from the first request. The JVM&amp;rsquo;s JIT compiler needs minutes of traffic before it optimizes hot paths — during that window, tail latency is worse and data processing lags behind. A native binary has no such phase.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smaller attack surface:&lt;/strong&gt; No JDK runtime means fewer CVEs to track and patch. For a component that ingests data from every service in the cluster, that matters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are not incremental improvements. They change what deployment topologies are practical. Serverless observability backends, sidecar-model collectors, edge nodes with tight memory budgets — all become realistic when the backend is this light and this fast.&lt;/p&gt;
&lt;h2 id=&#34;the-challenge-a-mature-dynamic-java-platform&#34;&gt;The Challenge: A Mature, Dynamic Java Platform&lt;/h2&gt;
&lt;p&gt;SkyWalking OAP carries all the realities of a large Java platform: runtime bytecode generation, reflection-heavy initialization, classpath scanning, SPI-based module wiring, and dynamic DSL execution. These patterns are friendly to extensibility but hostile to GraalVM native image.&lt;/p&gt;
&lt;p&gt;The documented GraalVM limitations are only the beginning. In a mature OSS platform, those limitations are deeply entangled with years of runtime design decisions. Standard GraalVM native images struggle with runtime class generation, reflection, dynamic discovery, and script execution — all of which had deep roots in SkyWalking OAP.&lt;/p&gt;
&lt;p&gt;There was also a very concrete mountain in the early history of this distro. Upstream SkyWalking relied heavily on Groovy for LAL, MAL, and Hierarchy scripts. In theory, that was just one more unsupported runtime-heavy component. In practice, Groovy was the biggest obstacle in the whole path. It represented not only script execution, but a whole dynamic model that was deeply convenient on the JVM side and deeply unfriendly to native image.&lt;/p&gt;
&lt;h2 id=&#34;the-design-goal-make-migration-repeatable&#34;&gt;The Design Goal: Make Migration Repeatable&lt;/h2&gt;
&lt;p&gt;The final design is not just &amp;ldquo;run native-image successfully.&amp;rdquo; It is a system that keeps migration work repeatable:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Pre-compile runtime-generated assets at build time.&lt;/strong&gt; OAL, MAL, LAL, Hierarchy rules, and meter-related generated classes are compiled during the build and packaged as artifacts instead of being generated at startup.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Replace dynamic discovery with deterministic loading.&lt;/strong&gt; Classpath scanning and runtime registration paths are converted into manifest-driven loading.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduce runtime reflection and generate native metadata from the build.&lt;/strong&gt; Reflection configuration is produced from actual manifests and scanned classes instead of being maintained as a hand-written guess list.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep the upstream sync boundary explicit.&lt;/strong&gt; Same-FQCN replacements are intentionally packaged, inventoried, and guarded with staleness checks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Make drift visible immediately.&lt;/strong&gt; If upstream providers, rule files, or replaced source files change, tests fail and force explicit review.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That is the architectural shift that matters most. Reusable abstraction and foresight did not become less important in the AI era. They became more important, because they determine whether AI speed produces a maintainable system or just a fast-growing pile of code.&lt;/p&gt;
&lt;h2 id=&#34;turning-runtime-dynamism-into-build-time-assets&#34;&gt;Turning Runtime Dynamism into Build-Time Assets&lt;/h2&gt;
&lt;p&gt;SkyWalking OAP has several dynamic subsystems that are natural in a JVM world but problematic for native image:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OAL generates classes at runtime.&lt;/li&gt;
&lt;li&gt;LAL, MAL, and Hierarchy were historically tied to Groovy-heavy runtime behavior, which became one of the biggest practical blockers in the early distro work.&lt;/li&gt;
&lt;li&gt;MAL, LAL, and Hierarchy rules depend on runtime compilation behavior.&lt;/li&gt;
&lt;li&gt;Guava-based classpath scanning discovers annotations, dispatchers, decorators, and meter functions.&lt;/li&gt;
&lt;li&gt;SPI-based module/provider discovery expects a more dynamic runtime environment.&lt;/li&gt;
&lt;li&gt;YAML/config initialization and framework integrations depend on reflective access.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In SkyWalking GraalVM Distro, these are not solved one by one as isolated patches. They are pulled into a build-time pipeline.&lt;/p&gt;
&lt;p&gt;The precompiler runs the DSL engines during the build, exports generated classes, writes manifests, serializes config data, and generates native-image metadata. That means startup becomes class loading and registration, not runtime code generation. The runtime path is simpler because the build path became richer.&lt;/p&gt;
&lt;p&gt;This is also why the project is more than a performance exercise. The design goal was to move complexity into a place where it is easier to verify, easier to automate, and easier to repeat.&lt;/p&gt;
&lt;h2 id=&#34;same-fqcn-replacements-as-a-controlled-boundary&#34;&gt;Same-FQCN Replacements as a Controlled Boundary&lt;/h2&gt;
&lt;p&gt;One of the most practical design choices in this distro is the use of same-FQCN replacement classes. We do not rely on vague startup tricks or undocumented ordering assumptions. Instead, the GraalVM-specific jars are repackaged so the original upstream classes are excluded and the replacement classes occupy the exact same fully-qualified names.&lt;/p&gt;
&lt;p&gt;This matters for maintainability. It creates a very clear boundary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the upstream class still defines the behavior contract,&lt;/li&gt;
&lt;li&gt;the GraalVM replacement provides a compatible implementation strategy,&lt;/li&gt;
&lt;li&gt;and the packaging makes that swap explicit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, OAL loading changes from runtime compilation into manifest-driven loading of precompiled classes. Similar replacements handle MAL and LAL DSL loading, module wiring, config initialization, and several reflection-sensitive paths. The goal is not to fork everything. The goal is to replace only the places where the runtime model is fundamentally unfriendly to native image.&lt;/p&gt;
&lt;p&gt;That boundary is then guarded by tests that hash the upstream source files corresponding to the replacements. When upstream changes one of those files, the build fails and tells us exactly which replacement needs review. This is what turns &amp;ldquo;keeping up with upstream&amp;rdquo; from an anxiety problem into a visible engineering task.&lt;/p&gt;
&lt;h2 id=&#34;reflection-config-is-generated-not-guessed&#34;&gt;Reflection Config Is Generated, Not Guessed&lt;/h2&gt;
&lt;p&gt;In many GraalVM migrations, &lt;code&gt;reflect-config.json&lt;/code&gt; becomes a manually accumulated artifact. It grows over time, gets stale, and nobody is fully sure whether it is complete or why each entry exists. That approach does not scale well for a large, evolving OSS platform.&lt;/p&gt;
&lt;p&gt;In this distro, reflection metadata is generated from the build outputs and scanned classes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;manifests for OAL, MAL, LAL, Hierarchy, and meter-generated classes,&lt;/li&gt;
&lt;li&gt;annotation-scanned classes,&lt;/li&gt;
&lt;li&gt;Armeria HTTP handlers,&lt;/li&gt;
&lt;li&gt;GraphQL resolvers and schema-mapped types,&lt;/li&gt;
&lt;li&gt;and accepted &lt;code&gt;ModuleConfig&lt;/code&gt; classes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a much healthier model. Instead of asking people to remember every reflective access path, the system derives reflection metadata from the actual migration pipeline. The build becomes the source of truth.&lt;/p&gt;
&lt;h2 id=&#34;keeping-upstream-sync-practical&#34;&gt;Keeping Upstream Sync Practical&lt;/h2&gt;
&lt;p&gt;If this distro were only a one-time engineering sprint, it would be much less interesting. The real challenge is keeping it alive while upstream SkyWalking continues to evolve.&lt;/p&gt;
&lt;p&gt;That is why the repo includes explicit inventories and drift detectors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;provider inventories that force new upstream providers to be categorized,&lt;/li&gt;
&lt;li&gt;rule-file inventories that force new DSL inputs to be acknowledged,&lt;/li&gt;
&lt;li&gt;SHA watchers for precompiled YAML inputs,&lt;/li&gt;
&lt;li&gt;and SHA watchers for upstream source files with GraalVM-specific replacements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Good abstraction is not only about elegant code structure. It is about choosing a migration design that can survive contact with future change.&lt;/p&gt;
&lt;h2 id=&#34;benchmark-results&#34;&gt;Benchmark Results&lt;/h2&gt;
&lt;p&gt;We benchmarked the standard JVM OAP against the GraalVM Distro on an Apple M3 Max (macOS, Docker Desktop, 10 CPUs / 62.7 GB), both connecting to BanyanDB.&lt;/p&gt;
&lt;h3 id=&#34;boot-test-docker-compose-no-traffic-median-of-3-runs&#34;&gt;Boot Test (Docker Compose, no traffic, median of 3 runs)&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Metric&lt;/th&gt;
          &lt;th&gt;JVM OAP&lt;/th&gt;
          &lt;th&gt;GraalVM OAP&lt;/th&gt;
          &lt;th&gt;Delta&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Cold boot startup&lt;/td&gt;
          &lt;td&gt;635 ms&lt;/td&gt;
          &lt;td&gt;5 ms&lt;/td&gt;
          &lt;td&gt;~127x faster&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Warm boot startup&lt;/td&gt;
          &lt;td&gt;630 ms&lt;/td&gt;
          &lt;td&gt;5 ms&lt;/td&gt;
          &lt;td&gt;~126x faster&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Idle RSS&lt;/td&gt;
          &lt;td&gt;~1.2 GiB&lt;/td&gt;
          &lt;td&gt;~41 MiB&lt;/td&gt;
          &lt;td&gt;~97% reduction&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Boot time is measured from OAP&amp;rsquo;s first application log timestamp to the &lt;code&gt;listening on 11800&lt;/code&gt; log line (gRPC server ready).&lt;/p&gt;
&lt;h3 id=&#34;under-sustained-load-kind--istio-1252--bookinfo-at-20-rps-2-oap-replicas&#34;&gt;Under Sustained Load (Kind + Istio 1.25.2 + Bookinfo at ~20 RPS, 2 OAP replicas)&lt;/h3&gt;
&lt;p&gt;30 samples at 10s intervals after 60s warmup.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Metric&lt;/th&gt;
          &lt;th&gt;JVM OAP&lt;/th&gt;
          &lt;th&gt;GraalVM OAP&lt;/th&gt;
          &lt;th&gt;Delta&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;CPU median (millicores)&lt;/td&gt;
          &lt;td&gt;101&lt;/td&gt;
          &lt;td&gt;68&lt;/td&gt;
          &lt;td&gt;-33%&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CPU avg (millicores)&lt;/td&gt;
          &lt;td&gt;107&lt;/td&gt;
          &lt;td&gt;67&lt;/td&gt;
          &lt;td&gt;-37%&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Memory median (MiB)&lt;/td&gt;
          &lt;td&gt;2068&lt;/td&gt;
          &lt;td&gt;629&lt;/td&gt;
          &lt;td&gt;&lt;strong&gt;-70%&lt;/strong&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Memory avg (MiB)&lt;/td&gt;
          &lt;td&gt;2082&lt;/td&gt;
          &lt;td&gt;624&lt;/td&gt;
          &lt;td&gt;&lt;strong&gt;-70%&lt;/strong&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Both variants reported identical entry-service CPM, confirming equivalent traffic processing capability.&lt;/p&gt;
&lt;p&gt;Service metrics collected every 30s via swctl for all discovered services:
&lt;code&gt;service_cpm&lt;/code&gt;, &lt;code&gt;service_resp_time&lt;/code&gt;, &lt;code&gt;service_sla&lt;/code&gt;, &lt;code&gt;service_apdex&lt;/code&gt;, &lt;code&gt;service_percentile&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Full benchmark scripts and raw data are in the &lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro/tree/main/benchmark&#34;&gt;benchmark/&lt;/a&gt; directory of the distro repository.&lt;/p&gt;
&lt;h2 id=&#34;current-status&#34;&gt;Current Status&lt;/h2&gt;
&lt;p&gt;The project is a runnable experimental distribution, hosted in its own repository: &lt;a href=&#34;https://github.com/apache/skywalking-graalvm-distro&#34;&gt;apache/skywalking-graalvm-distro&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The current distro intentionally focuses on a modern, high-performance operating model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Storage:&lt;/strong&gt; BanyanDB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cluster modes:&lt;/strong&gt; Standalone and Kubernetes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Configuration:&lt;/strong&gt; none or Kubernetes ConfigMap&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime model:&lt;/strong&gt; fixed module set, precompiled assets, and AOT-friendly wiring&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This focus is deliberate. A repeatable migration system starts by making a clear scope runnable, then expanding without losing control.&lt;/p&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;Because the SkyWalking GraalVM Distro is designed for peak performance, it is optimized to work with &lt;strong&gt;BanyanDB&lt;/strong&gt; as its storage backend. The current published image is available on Docker Hub, and you can boot the stack using the following &lt;code&gt;docker-compose.yml&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;version&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;3.8&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;services&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ghcr.io/apache/skywalking-banyandb:e1ba421bd624727760c7a69c84c6fe55878fb526&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17912:17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;17913:17913&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;command&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data --measure-metadata-cache-wait-duration 1m --stream-metadata-cache-wait-duration 1m&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;nc -nz 127.0.0.1 17912&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;apache/skywalking-graalvm-distro:0.1.1&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;oap&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;banyandb&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;11800:11800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;12800:12800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_STORAGE_BANYANDB_TARGETS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;banyandb:17912&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_HEALTH_CHECKER&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;default&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;healthcheck&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;test&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;CMD-SHELL&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;nc -nz 127.0.0.1 11800 || exit 1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;interval&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;5s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;timeout&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;retries&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ui&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;image&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ghcr.io/apache/skywalking/ui:10.3.0&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;container_name&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;ui&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;depends_on&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;oap&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;service_healthy&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;restart&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;always&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;ports&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;8080:8080&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;environment&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#fff&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;SW_OAP_ADDRESS&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#fff&#34;&gt; &lt;/span&gt;http://oap:12800&lt;span style=&#34;color:#fff&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Simply run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We invite the community to test this new distribution, report issues, and help us move it toward a production-ready state.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Special thanks to the GraalVM team for the technology foundation.&lt;/em&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: SkyWalking 10 Release: Service Hierarchy, Kubernetes Network Monitoring by eBPF, BanyanDB, and More</title>
      <link>/blog/2024-05-13-skywalking-10-release/</link>
      <pubDate>Mon, 13 May 2024 00:00:00 +0000</pubDate>
      <guid>/blog/2024-05-13-skywalking-10-release/</guid>
      <description>
        
        
        &lt;p&gt;The Apache SkyWalking team today announced the 10 release. SkyWalking 10 provides a host of groundbreaking features and enhancements.
The introduction of Layer and Service Hierarchy streamlines monitoring by organizing services and metrics into distinct layers and providing seamless navigation across them.
Leveraging eBPF technology, Kubernetes Network Monitoring delivers granular insights into network traffic, topology, and TCP/HTTP metrics.
BanyanDB emerges as a high-performance native storage solution, while expanded monitoring support encompasses Apache RocketMQ, ClickHouse,
and Apache ActiveMQ Classic. Support for Multiple Labels Names enhances flexibility in metrics analysis,
while enhanced exporting and querying capabilities streamline data dissemination and processing.&lt;/p&gt;
&lt;p&gt;This release blog briefly introduces these new features and enhancements as well as some other notable changes.&lt;/p&gt;
&lt;h2 id=&#34;layer-and-service-hierarchy&#34;&gt;Layer and Service Hierarchy&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;Layer&lt;/code&gt; concept was introduced in SkyWalking 9.0.0, it represents an abstract framework in computer science,
such as Operating System(OS_LINUX layer), Kubernetes(k8s layer). It organizes services and metrics into different layers based on their roles
and responsibilities in the system. SkyWalking provides a suite of monitoring and diagnostic tools for each layer, but there is a gap between the layers,
which can not easily bridge the data across different layers.&lt;/p&gt;
&lt;p&gt;In SkyWalking 10, SkyWalking provides new abilities to jump/connect across different layers and provide a seamless monitoring experience for users.&lt;/p&gt;
&lt;h3 id=&#34;layer-jump&#34;&gt;Layer Jump&lt;/h3&gt;
&lt;p&gt;In the topology graph, users can click on a service node to jump to the dashboard of the service in another layer.
The following figures show the jump from the &lt;code&gt;GENERAL&lt;/code&gt; layer service topology to the &lt;code&gt;VIRTUAL_DATABASE&lt;/code&gt; service layer dashboard by clicking the topology node.
&lt;img src=&#34;layer_jump.jpg&#34; alt=&#34;Figure 1: Layer Jump&#34;&gt;
Figure 1: Layer Jump&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;layer_jump2.jpg&#34; alt=&#34;Figure 2: Layer jump Dashboard&#34;&gt;
Figure 2: Layer jump Dashboard&lt;/p&gt;
&lt;h3 id=&#34;service-hierarchy&#34;&gt;Service Hierarchy&lt;/h3&gt;
&lt;p&gt;SkyWalking 10 introduces a new concept called &lt;code&gt;Service Hierarchy&lt;/code&gt;, which defines the relationships of existing logically same services in various layers.
OAP will detect the services from different layers, and try to build the connections.
Users can click the &lt;code&gt;Hierarchy Services&lt;/code&gt; in any layer&amp;rsquo;s service topology node or service dashboard to get the &lt;code&gt;Hierarchy Topology&lt;/code&gt;.
In this topology graph, users can see the relationships between the services in different layers and the summary of the metrics and also can jump to the service dashboard in the layer.
When a service occurs performance issue, users can easily analyze the metrics from different layers and track down the root cause:&lt;/p&gt;
&lt;p&gt;The examples of the &lt;code&gt;Service Hierarchy&lt;/code&gt; relationships:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The application &lt;code&gt;song&lt;/code&gt; deployed in the Kubernetes cluster with SkyWalking agent and Service Mesh at the same time.
So the application &lt;code&gt;song&lt;/code&gt; across the &lt;code&gt;GENERAL&lt;/code&gt;, &lt;code&gt;MESH&lt;/code&gt;, &lt;code&gt;MESH_DP&lt;/code&gt; and &lt;code&gt;K8S_SERVICE&lt;/code&gt; layers which could be monitored by SkyWalking,
the &lt;code&gt;Service Hierarchy&lt;/code&gt; topology as below:
&lt;img src=&#34;song.jpg&#34; alt=&#34;Figure 3: Service Hierarchy Agent With K8s Service And Mesh With K8s Service&#34;&gt;
Figure 3: Service Hierarchy Agent With K8s Service And Mesh With K8s Service.&lt;/br&gt;
&lt;/br&gt;
And can also have the &lt;code&gt;Service Instance Hierarchy&lt;/code&gt; topology to get the single instance status across the layers as below:
&lt;img src=&#34;song_instance.jpg&#34; alt=&#34;Figure 4: Instance Hierarchy Agent With K8s Service(Pod)&#34;&gt;
Figure 4: Instance Hierarchy Agent With K8s Service(Pod)&lt;/br&gt;
&lt;/br&gt;&lt;/li&gt;
&lt;li&gt;The PostgreSQL database &lt;code&gt;psql&lt;/code&gt; deployed in the Kubernetes cluster and used by the application &lt;code&gt;song&lt;/code&gt;.
So the database &lt;code&gt;psql&lt;/code&gt; across the &lt;code&gt;VIRTUAL_DATABASE&lt;/code&gt;, &lt;code&gt;POSTGRESQL&lt;/code&gt; and &lt;code&gt;K8S_SERVICE&lt;/code&gt; layers which could be monitored by SkyWalking,
the &lt;code&gt;Service Hierarchy&lt;/code&gt; topology as below:
&lt;img src=&#34;postgre.jpg&#34; alt=&#34;Figure 5: Service Hierarchy Agent(Virtual Database) With Real Database And K8s Service&#34;&gt;
Figure 5: Service Hierarchy Agent(Virtual Database) With Real Database And K8s Service&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For more supported layers and how to detect the relationships between services in different layers please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/concepts-and-designs/service-hierarchy/#service-hierarchy&#34;&gt;Service Hierarchy&lt;/a&gt;.
how to configure the &lt;code&gt;Service Hierarchy&lt;/code&gt; in SkyWalking, please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/concepts-and-designs/service-hierarchy-configuration/&#34;&gt;Service Hierarchy Configuration&lt;/a&gt; section.&lt;/p&gt;
&lt;h2 id=&#34;monitoring-kubernetes-network-traffic-by-using-ebpf&#34;&gt;Monitoring Kubernetes Network Traffic by using eBPF&lt;/h2&gt;
&lt;p&gt;In the previous version, skyWalking provides &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-k8s-monitoring-metrics-cadvisor/&#34;&gt;Kubernetes (K8s) monitoring from kube-state-metrics and cAdvisor&lt;/a&gt;,
which can monitor the Kubernetes cluster status and the metrics of the Kubernetes resources.&lt;/p&gt;
&lt;p&gt;In SkyWalking 10, by leverage &lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-rover/latest/readme/&#34;&gt;Apache SkyWalking Rover&lt;/a&gt; 0.6+,
SkyWalking has the ability to monitor the Kubernetes network traffic by using eBPF, which can collect and map access logs from applications in Kubernetes environments.
Through these data, SkyWalking can analyze and provide the Service Traffic, Topology, TCP/HTTP level metrics from the Kubernetes aspect.&lt;/p&gt;
&lt;p&gt;The following figures show the Topology and TCP Dashboard of the Kubernetes network traffic:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;k8s_topology.jpg&#34; alt=&#34;Figure 6: Kubernetes Network Traffic Topology&#34;&gt;
Figure 6: Kubernetes Network Traffic Topology&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;k8s_dashboard.jpg&#34; alt=&#34;Figure 7: Kubernetes Network Traffic TCP Dashboard&#34;&gt;
Figure 7: Kubernetes Network Traffic TCP Dashboard&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;More details about how to monitor the Kubernetes network traffic by using eBPF, please refer to the &lt;a href=&#34;https://skywalking.apache.org/blog/2024-03-18-monitor-kubernetes-network-by-ebpf/&#34;&gt;Monitoring Kubernetes Network Traffic by using eBPF&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;banyandb---native-apm-database&#34;&gt;BanyanDB - Native APM Database&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/skywalking-banyandb/latest/readme/&#34;&gt;BanyanDB&lt;/a&gt; 0.6.0 and &lt;a href=&#34;https://github.com/apache/skywalking-banyandb-java-client&#34;&gt;BanyanDB Java client&lt;/a&gt; 0.6.0 are released with SkyWalking 10,
As a native storage solution for SkyWalking, BanyanDB is going to be SkyWalking&amp;rsquo;s next-generation storage solution. This is recommended to use for medium-scale deployments from 0.6 until 1.0.&lt;br&gt;
It has shown high potential performance improvement. Less than 50% CPU usage and 50% memory usage with 40% disk volume compared to Elasticsearch in the same scale.&lt;/p&gt;
&lt;h2 id=&#34;apache-rocketmq-server-monitoring&#34;&gt;Apache RocketMQ Server Monitoring&lt;/h2&gt;
&lt;p&gt;Apache RocketMQ is an open-source distributed messaging and streaming platform, which is widely used in various scenarios including Internet, big data, mobile Internet, IoT, and other fields.
SkyWalking provides a basic monitoring dashboard for RocketMQ, which includes the following metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cluster Metrics: including messages produced/consumed today, total producer/consumer TPS, producer/consumer message size, messages produced/consumed until yesterday, max consumer latency, max commitLog disk ratio, commitLog disk ratio, pull/send threadPool queue head wait time, topic count, and broker count.&lt;/li&gt;
&lt;li&gt;Broker Metrics: including produce/consume TPS, producer/consumer message size.&lt;/li&gt;
&lt;li&gt;Topic Metrics: including max producer/consumer message size, consumer latency, producer/consumer TPS, producer/consumer offset, producer/consumer message size, consumer group count, and broker count.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following figure shows the RocketMQ Cluster Metrics dashboard:
&lt;img src=&#34;rocket_mq.jpg&#34; alt=&#34;Figure 8: Apache RocketMQ Server Monitoring&#34;&gt;
Figure 8: Apache RocketMQ Server Monitoring&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;For more metrics and details about the RocketMQ monitoring, please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-rocketmq-monitoring/&#34;&gt;Apache RocketMQ Server Monitoring&lt;/a&gt;,&lt;/p&gt;
&lt;h2 id=&#34;clickhouse-server-monitoring&#34;&gt;ClickHouse Server Monitoring&lt;/h2&gt;
&lt;p&gt;ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real-time, it is widely used for online analytical processing (OLAP).
ClickHouse monitoring provides monitoring of the metrics 、events and asynchronous metrics of the ClickHouse server, which includes the following parts of metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Server Metrics&lt;/li&gt;
&lt;li&gt;Query Metrics&lt;/li&gt;
&lt;li&gt;Network Metrics&lt;/li&gt;
&lt;li&gt;Insert Metrics&lt;/li&gt;
&lt;li&gt;Replica Metrics&lt;/li&gt;
&lt;li&gt;MergeTree Metrics&lt;/li&gt;
&lt;li&gt;ZooKeeper Metrics&lt;/li&gt;
&lt;li&gt;Embedded ClickHouse Keeper Metrics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following figure shows the ClickHouse Cluster Metrics dashboard:
&lt;img src=&#34;clickhouse.jpg&#34; alt=&#34;Figure 9: ClickHouse Server Monitoring&#34;&gt;
Figure 9: ClickHouse Server Monitoring&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;For more metrics and details about the ClickHouse monitoring, please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-clickhouse-monitoring/&#34;&gt;ClickHouse Server Monitoring&lt;/a&gt;,
and here is a blog that can help for a quick start &lt;a href=&#34;https://skywalking.apache.org/blog/2024-03-12-monitoring-clickhouse-through-skywalking/&#34;&gt;Monitoring ClickHouse through SkyWalking&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;apache-activemq-server-monitoring&#34;&gt;Apache ActiveMQ Server Monitoring&lt;/h2&gt;
&lt;p&gt;Apache ActiveMQ Classic is a popular and powerful open-source messaging and integration pattern server.
SkyWalking provides a basic monitoring dashboard for ActiveMQ, which includes the following metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cluster Metrics: including memory usage, rates of write/read, and average/max duration of write.&lt;/li&gt;
&lt;li&gt;Broker Metrics: including node state, number of connections, number of producers/consumers, and rate of write/read under the broker. Depending on the cluster mode, one cluster may include one or more brokers.&lt;/li&gt;
&lt;li&gt;Destination Metrics: including number of producers/consumers, messages in different states, queues, and enqueue duration in a queue/topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following figure shows the ActiveMQ Cluster Metrics dashboard:
&lt;img src=&#34;active_mq.jpg&#34; alt=&#34;Figure 10: Apache ActiveMQ Server Monitoring&#34;&gt;
Figure 10: Apache ActiveMQ Server Monitoring&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;For more metrics and details about the ActiveMQ monitoring, please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-activemq-monitoring/&#34;&gt;Apache ActiveMQ Server Monitoring&lt;/a&gt;,
and here is a blog that can help for a quick start &lt;a href=&#34;https://skywalking.apache.org/blog/2024-04-19-monitoring-activemq-through-skywalking/&#34;&gt;Monitoring ActiveMQ through SkyWalking&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;support-multiple-labels-names&#34;&gt;Support Multiple Labels Names&lt;/h2&gt;
&lt;p&gt;Before SkyWalking 10, SkyWalking does not store the labels names in the metrics data, which makes MQE have to use &lt;code&gt;_&lt;/code&gt; as the generic label name,
it can&amp;rsquo;t query the metrics data with multiple labels names.&lt;/p&gt;
&lt;p&gt;SkyWalking 10 supports storing the labels names in the metrics data, and MQE can query or calculate the metrics data with multiple labels names.
For example:
The &lt;code&gt;k8s_cluster_deployment_status&lt;/code&gt; metric has labels &lt;code&gt;namespace&lt;/code&gt;, &lt;code&gt;deployment&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt;.
If we want to query all deployment metric values with &lt;code&gt;namespace=skywalking-showcase&lt;/code&gt; and &lt;code&gt;status=true&lt;/code&gt;, we can use the following expression:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s_cluster_deployment_status{namespace=&amp;#39;skywalking-showcase&amp;#39;, status=&amp;#39;true&amp;#39;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;related enhancement:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Since Alarm rule configuration had migrated to the MQE in SkyWalking 9.6.0, the alarm rule also supports multiple labels names.&lt;/li&gt;
&lt;li&gt;PromeQL service supports multiple labels names query.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;metrics-grpc-exporter&#34;&gt;Metrics gRPC exporter&lt;/h2&gt;
&lt;p&gt;SkyWalking 10 enhanced the &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/setup/backend/exporter/#grpc-exporter&#34;&gt;metrics gPRC exporter&lt;/a&gt;,
it supports exporting all types of metrics data to the gRPC server.&lt;/p&gt;
&lt;h2 id=&#34;skywalking-native-ui-metrics-query-switch-to-v3-apis&#34;&gt;SkyWalking Native UI Metrics Query Switch to V3 APIs&lt;/h2&gt;
&lt;p&gt;SkyWalking Native UI metrics query deprecate the V2 APIs, and all migrated to &lt;a href=&#34;https://skywalking.apache.org/docs/main/latest/en/api/query-protocol/#v3-apis&#34;&gt;V3 APIs&lt;/a&gt;
and &lt;a href=&#34;https://skywalking.apache.org/docs/main/next/en/api/metrics-query-expression/#metrics-query-expressionmqe-syntax&#34;&gt;MQE&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;other-notable-enhancements&#34;&gt;Other Notable Enhancements&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Support Java 21 runtime and oap-java21 image for Java 21 runtime.&lt;/li&gt;
&lt;li&gt;Remove CLI(&lt;code&gt;swctl&lt;/code&gt;) from the image.&lt;/li&gt;
&lt;li&gt;More MQE functions and operators supported.&lt;/li&gt;
&lt;li&gt;Enhance the native UI and improve the user experience.&lt;/li&gt;
&lt;li&gt;Several bugs and CVEs fixed.&lt;/li&gt;
&lt;/ol&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: How to Use SkyWalking for Distributed Tracing in Istio?</title>
      <link>/blog/how-to-use-skywalking-for-distributed-tracing-in-istio/</link>
      <pubDate>Wed, 14 Dec 2022 00:00:00 +0000</pubDate>
      <guid>/blog/how-to-use-skywalking-for-distributed-tracing-in-istio/</guid>
      <description>
        
        
        &lt;p&gt;In cloud native applications, a request often needs to be processed through a series of APIs or backend services, some of which are parallel and some serial and located on different platforms or nodes. How do we determine the service paths and nodes a call goes through to help us troubleshoot the problem? This is where distributed tracing comes into play.&lt;/p&gt;
&lt;p&gt;This article covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How distributed tracing works&lt;/li&gt;
&lt;li&gt;How to choose distributed tracing software&lt;/li&gt;
&lt;li&gt;How to use distributed tracing in Istio&lt;/li&gt;
&lt;li&gt;How to view distributed tracing data using Bookinfo and SkyWalking as examples&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;distributed-tracing-basics&#34;&gt;Distributed Tracing Basics&lt;/h2&gt;
&lt;p&gt;Distributed tracing is a method for tracing requests in a distributed system to help users better understand, control, and optimize distributed systems. There are two concepts used in distributed tracing: TraceID and SpanID. You can see them in Figure 1 below.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TraceID&lt;/strong&gt; is a globally unique ID that identifies the trace information of a request. All traces of a request belong to the same TraceID, and the TraceID remains constant throughout the trace of the request.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SpanID&lt;/strong&gt; is a locally unique ID that identifies a request’s trace information at a certain time. A request generates different SpanIDs at different periods, and SpanIDs are used to distinguish trace information for a request at different periods.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;TraceID and SpanID are the basis of distributed tracing. They provide a uniform identifier for request tracing in distributed systems and facilitate users’ ability to query, manage, and analyze the trace information of requests.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f1.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Trace and span&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The following is the process of distributed tracing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When a system receives a request, the distributed tracing system assigns a TraceID to the request, which is used to chain together the entire chain of invocations.&lt;/li&gt;
&lt;li&gt;The distributed trace system generates a SpanID and ParentID for each service call within the system for the request, which is used to record the parent-child relationship of the call; a Span without a ParentID is used as the entry point of the call chain.&lt;/li&gt;
&lt;li&gt;TraceID and SpanID are to be passed during each service call.&lt;/li&gt;
&lt;li&gt;When viewing a distributed trace, query the full process of a particular request by TraceID.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-istio-implements-distributed-tracing&#34;&gt;How Istio Implements Distributed Tracing&lt;/h2&gt;
&lt;p&gt;Istio’s distributed tracing is based on information collected by the Envoy proxy in the data plane. After a service request is intercepted by Envoy, Envoy adds tracing information as headers to the request forwarded to the destination workload. The following headers are relevant for distributed tracing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As TraceID: x-request-id&lt;/li&gt;
&lt;li&gt;Used to establish parent-child relationships for Span in the LightStep trace: x-ot-span-context&amp;lt;/li&lt;/li&gt;
&lt;li&gt;Used for Zipkin, also for Jaeger, SkyWalking, see &lt;a href=&#34;https://github.com/openzipkin/b3-propagation&#34;&gt;b3-propagation&lt;/a&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;x-b3-traceid&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-b3-traceid&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-b3-spanid&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-b3-parentspanid&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-b3-sampled&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-b3-flags&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;b3&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;For Datadog:
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;x-datadog-trace-id&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-datadog-parent-id&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;x-datadog-sampling-priority&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;For SkyWalking: &lt;em&gt;sw8&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;For AWS X-Ray: &lt;em&gt;x-amzn-trace-id&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information on how to use these headers, please see the &lt;a href=&#34;https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers&#34;&gt;Envoy documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Regardless of the language of your application, Envoy will generate the appropriate tracing headers for you at the Ingress Gateway and forward these headers to the upstream cluster. However, in order to utilize the distributed tracing feature, you must modify your application code to attach the tracing headers to upstream requests. Since neither the service mesh nor the application can automatically propagate these headers, you can integrate the agent for distributed tracing into the application or manually propagate these headers in the application code itself. Once the tracing headers are propagated to all upstream requests, Envoy will send the tracing data to the tracer’s back-end processing, and then you can view the tracing data in the UI.&lt;/p&gt;
&lt;p&gt;For example, look at the code of the Productpage service in the &lt;a href=&#34;https://istio.io/latest/docs/examples/bookinfo/&#34;&gt;Bookinfo application&lt;/a&gt;. You can see that it integrates the Jaeger client library and synchronizes the header generated by Envoy with the HTTP requests to the Details and Reviews services in the &lt;em&gt;getForwardHeaders (request)&lt;/em&gt; function.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;getForwardHeaders&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;request&lt;span style=&#34;color:#1f2328&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    headers &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Using Jaeger agent to get the x-b3-* headers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    span &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; get_current_span&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    carrier &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    tracer&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;inject&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        span_context&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;span&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;context&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;format&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;Format&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;HTTP_HEADERS&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        carrier&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;carrier&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    headers&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;update&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;carrier&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Dealing with the non x-b3-* header manually&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; session&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        headers&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;end-user&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; session&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    incoming_headers &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-request-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-ot-span-context&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-trace-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-parent-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-sampling-priority&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;traceparent&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;tracestate&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-cloud-trace-context&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;grpc-trace-bin&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;sw8&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user-agent&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;cookie&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;authorization&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;jwt&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;for&lt;/span&gt; ihdr &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; incoming_headers&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        val &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; request&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;headers&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;ihdr&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; val &lt;span style=&#34;color:#0550ae&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;None&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            headers&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;ihdr&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; val
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; headers
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For more information, the &lt;a href=&#34;https://istio.io/latest/about/faq/#distributed-tracing&#34;&gt;Istio documentation&lt;/a&gt; provides answers to frequently asked questions about distributed tracing in Istio.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose-a-distributed-tracing-system&#34;&gt;How to Choose A Distributed Tracing System&lt;/h2&gt;
&lt;p&gt;Distributed tracing systems are similar in principle. There are many such systems on the market, such as &lt;a href=&#34;https://github.com/apache/skywalking&#34;&gt;Apache SkyWalking&lt;/a&gt;, &lt;a href=&#34;https://github.com/jaegertracing/jaeger&#34;&gt;Jaeger&lt;/a&gt;, &lt;a href=&#34;https://github.com/openzipkin/zipkin/&#34;&gt;Zipkin&lt;/a&gt;, &lt;a href=&#34;https://lightstep.com/&#34;&gt;Lightstep&lt;/a&gt;, &lt;a href=&#34;https://github.com/pinpoint-apm/pinpoint&#34;&gt;Pinpoint&lt;/a&gt;, and so on. For our purposes here, we will choose three of them and compare them in several dimensions. Here are our inclusion criteria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They are currently the most popular open-source distributed tracing systems.&lt;/li&gt;
&lt;li&gt;All are based on the OpenTracing specification.&lt;/li&gt;
&lt;li&gt;They support integration with Istio and Envoy.&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Items&lt;/th&gt;
          &lt;th&gt;Apache SkyWalking&lt;/th&gt;
          &lt;th&gt;Jaeger&lt;/th&gt;
          &lt;th&gt;Zipkin&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Implementations&lt;/td&gt;
          &lt;td&gt;Language-based probes, service mesh probes, eBPF agent, third-party instrumental libraries (Zipkin currently supported)&lt;/td&gt;
          &lt;td&gt;Language-based probes&lt;/td&gt;
          &lt;td&gt;Language-based probes&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Database&lt;/td&gt;
          &lt;td&gt;ES, H2, MySQL, TiDB, Sharding-sphere, BanyanDB&lt;/td&gt;
          &lt;td&gt;ES, MySQL, Cassandra, Memory&lt;/td&gt;
          &lt;td&gt;ES, MySQL, Cassandra, Memory&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Supported Languages&lt;/td&gt;
          &lt;td&gt;Java, Rust, PHP, NodeJS, Go, Python, C++, .Net, Lua&lt;/td&gt;
          &lt;td&gt;Java, Go, Python, NodeJS, C#, PHP, Ruby, C++&lt;/td&gt;
          &lt;td&gt;Java, Go, Python, NodeJS, C#, PHP, Ruby, C++&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Initiator&lt;/td&gt;
          &lt;td&gt;Personal&lt;/td&gt;
          &lt;td&gt;Uber&lt;/td&gt;
          &lt;td&gt;Twitter&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Governance&lt;/td&gt;
          &lt;td&gt;Apache Foundation&lt;/td&gt;
          &lt;td&gt;CNCF&lt;/td&gt;
          &lt;td&gt;CNCF&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Version&lt;/td&gt;
          &lt;td&gt;9.3.0&lt;/td&gt;
          &lt;td&gt;1.39.0&lt;/td&gt;
          &lt;td&gt;2.23.19&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Stars&lt;/td&gt;
          &lt;td&gt;20.9k&lt;/td&gt;
          &lt;td&gt;16.8k&lt;/td&gt;
          &lt;td&gt;15.8k&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Although Apache SkyWalking’s agent does not support as many languages as Jaeger and Zipkin, SkyWalking’s implementation is richer and compatible with Jaeger and Zipkin trace data, and development is more active, so it is one of the best choices for building a telemetry platform.&lt;/p&gt;
&lt;h2 id=&#34;demo&#34;&gt;Demo&lt;/h2&gt;
&lt;p&gt;Refer to the &lt;a href=&#34;https://istio.io/latest/docs/tasks/observability/distributed-tracing/skywalking/&#34;&gt;Istio documentation&lt;/a&gt; to install and configure Apache SkyWalking.&lt;/p&gt;
&lt;h3 id=&#34;environment-description&#34;&gt;Environment Description&lt;/h3&gt;
&lt;p&gt;The following is the environment for our demo:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes 1.24.5&lt;/li&gt;
&lt;li&gt;Istio 1.16&lt;/li&gt;
&lt;li&gt;SkyWalking 9.1.0&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;install-istio&#34;&gt;Install Istio&lt;/h3&gt;
&lt;p&gt;Before installing Istio, you can check the environment for any problems:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ istioctl experimental precheck
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;✔ No issues found when checking the cluster. Istio is safe to install or upgrade!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  To get started, check out https://istio.io/latest/docs/setup/getting-started/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then install Istio and configure the destination for sending tracing messages as SkyWalking:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Initial Istio Operator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl operator init
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Configure tracing destination&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f - &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;apiVersion: install.istio.io/v1alpha1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;kind: IstioOperator
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;metadata:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  namespace: istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  name: istio-with-skywalking
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;spec:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  meshConfig:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    defaultProviders:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      tracing:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      - &amp;#34;skywalking&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    enableTracing: true
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    extensionProviders:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    - name: &amp;#34;skywalking&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      skywalking:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;        service: tracing.istio-system.svc.cluster.local
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;        port: 11800
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;deploy-apache-skywalking&#34;&gt;Deploy Apache SkyWalking&lt;/h2&gt;
&lt;p&gt;Istio 1.16 supports distributed tracing using Apache SkyWalking. Install SkyWalking by executing the following code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;https://raw.githubusercontent.com/istio/istio/release-1.16/samples/addons/extras/skywalking.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It will install the following components under the &lt;em&gt;istio-system&lt;/em&gt; namespace:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/en/concepts-and-designs/backend-overview/&#34;&gt;SkyWalking Observability Analysis Platform (OAP)&lt;/a&gt;: Used to receive trace data, supports SkyWalking native data formats, Zipkin v1 and v2 and Jaeger format.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/en/ui/readme/&#34;&gt;UI&lt;/a&gt;: Used to query distributed trace data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information about SkyWalking, please refer to the &lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/readme/&#34;&gt;SkyWalking documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;deploy-the-bookinfo-application&#34;&gt;Deploy the Bookinfo Application&lt;/h2&gt;
&lt;p&gt;Execute the following command to install the bookinfo application:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl label namespace default istio-injection&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;enabled
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Launch the SkyWalking UI:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl dashboard skywalking
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Figure 2 shows all the services available in the bookinfo application:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f2.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: SkyWalking General Service page&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You can also see information about instances, endpoints, topology, tracing, etc. For example, Figure 3 shows the service topology of the bookinfo application:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f3.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Topology diagram of the Bookinfo application&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Tracing views in SkyWalking can be displayed in a variety of formats, including list, tree, table, and statistics. See Figure 4:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f4.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 4: SkyWalking General Service trace supports multiple display formats&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;To facilitate our examination, set the sampling rate of the trace to 100%:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f - &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;apiVersion: telemetry.istio.io/v1alpha1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;kind: Telemetry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;metadata:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  name: mesh-default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  namespace: istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;spec:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  tracing:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  - randomSamplingPercentage: 100.00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; &lt;em&gt;It’s generally not good practice to set the sampling rate to 100% in a production environment. To avoid the overhead of generating too many trace logs in production, please adjust the sampling strategy (sampling percentage).&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;uninstall&#34;&gt;Uninstall&lt;/h2&gt;
&lt;p&gt;After experimenting, uninstall Istio and SkyWalking by executing the following command.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;samples/bookinfo/platform/kube/cleanup.sh
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl unintall --purge
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl delete namespace istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;understanding-the-bookinfo-tracing-information&#34;&gt;Understanding the Bookinfo Tracing Information&lt;/h2&gt;
&lt;p&gt;Navigate to the General Service tab in the Apache SkyWalking UI, and you can see the trace information for the most recent &lt;em&gt;istio-ingressgateway&lt;/em&gt; service, as shown in Figure 5. Click on each span to see the details.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f5.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 5: The table view shows the basic information about each span.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Switching to the list view, you can see the execution order and duration of each span, as shown in Figure 6:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f6.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 6: List display&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You might want to know why such a straightforward application generates so much span data. Because after we inject the Envoy proxy into the pod, every request between services will be intercepted and processed by Envoy, as shown in Figure 7:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f7.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 7: Envoy intercepts requests to generate a span&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The tracing process is shown in Figure 8:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f8.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 8: Trace of the Bookinfo application&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We give each span a label with a serial number, and the time taken is indicated in parentheses. For illustration purposes, we have summarized all spans in the table below.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;No.&lt;/th&gt;
          &lt;th&gt;Endpoint&lt;/th&gt;
          &lt;th&gt;Total Duration (ms)&lt;/th&gt;
          &lt;th&gt;Component Duration (ms)&lt;/th&gt;
          &lt;th&gt;Current Service&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;/productpage&lt;/td&gt;
          &lt;td&gt;190&lt;/td&gt;
          &lt;td&gt;0&lt;/td&gt;
          &lt;td&gt;istio-ingressgateway&lt;/td&gt;
          &lt;td&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;/productpage&lt;/td&gt;
          &lt;td&gt;190&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;istio-ingressgateway&lt;/td&gt;
          &lt;td&gt;Ingress -&amp;gt; Productpage network transmission&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;/productpage&lt;/td&gt;
          &lt;td&gt;189&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;4&lt;/td&gt;
          &lt;td&gt;/productpage&lt;/td&gt;
          &lt;td&gt;188&lt;/td&gt;
          &lt;td&gt;21&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Application internal processing&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;5&lt;/td&gt;
          &lt;td&gt;/details/0&lt;/td&gt;
          &lt;td&gt;8&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;6&lt;/td&gt;
          &lt;td&gt;/details/0&lt;/td&gt;
          &lt;td&gt;7&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Productpage -&amp;gt; Details network transmission&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;7&lt;/td&gt;
          &lt;td&gt;/details/0&lt;/td&gt;
          &lt;td&gt;4&lt;/td&gt;
          &lt;td&gt;0&lt;/td&gt;
          &lt;td&gt;details&lt;/td&gt;
          &lt;td&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;8&lt;/td&gt;
          &lt;td&gt;/details/0&lt;/td&gt;
          &lt;td&gt;4&lt;/td&gt;
          &lt;td&gt;4&lt;/td&gt;
          &lt;td&gt;details&lt;/td&gt;
          &lt;td&gt;Application internal processing&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;9&lt;/td&gt;
          &lt;td&gt;/reviews/0&lt;/td&gt;
          &lt;td&gt;159&lt;/td&gt;
          &lt;td&gt;0&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;10&lt;/td&gt;
          &lt;td&gt;/reviews/0&lt;/td&gt;
          &lt;td&gt;159&lt;/td&gt;
          &lt;td&gt;14&lt;/td&gt;
          &lt;td&gt;productpage&lt;/td&gt;
          &lt;td&gt;Productpage -&amp;gt; Reviews network transmission&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;11&lt;/td&gt;
          &lt;td&gt;/reviews/0&lt;/td&gt;
          &lt;td&gt;145&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;reviews&lt;/td&gt;
          &lt;td&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;12&lt;/td&gt;
          &lt;td&gt;/reviews/0&lt;/td&gt;
          &lt;td&gt;144&lt;/td&gt;
          &lt;td&gt;109&lt;/td&gt;
          &lt;td&gt;reviews&lt;/td&gt;
          &lt;td&gt;Application internal processing&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;13&lt;/td&gt;
          &lt;td&gt;/ratings/0&lt;/td&gt;
          &lt;td&gt;35&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;reviews&lt;/td&gt;
          &lt;td&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;14&lt;/td&gt;
          &lt;td&gt;/ratings/0&lt;/td&gt;
          &lt;td&gt;33&lt;/td&gt;
          &lt;td&gt;16&lt;/td&gt;
          &lt;td&gt;reviews&lt;/td&gt;
          &lt;td&gt;Reviews -&amp;gt; Ratings network transmission&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;15&lt;/td&gt;
          &lt;td&gt;/ratings/0&lt;/td&gt;
          &lt;td&gt;17&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;ratings&lt;/td&gt;
          &lt;td&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;16&lt;/td&gt;
          &lt;td&gt;/ratings/0&lt;/td&gt;
          &lt;td&gt;16&lt;/td&gt;
          &lt;td&gt;16&lt;/td&gt;
          &lt;td&gt;ratings&lt;/td&gt;
          &lt;td&gt;Application internal processing&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;From the above information, it can be seen that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The total time consumed for this request is 190 ms.&lt;/li&gt;
&lt;li&gt;In Istio sidecar mode, each traffic flow in and out of the application container must pass through the Envoy proxy once, each time taking 0 to 2 ms.&lt;/li&gt;
&lt;li&gt;Network requests between Pods take between 1 and 16ms.&lt;/li&gt;
&lt;li&gt;This is because the data itself has errors and the start time of the Span is not necessarily equal to the end time of the parent Span.&lt;/li&gt;
&lt;li&gt;We can see that the most time-consuming part is the Reviews application, which takes 109 ms so that we can optimize it for that application.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Distributed tracing is an indispensable tool for analyzing performance and troubleshooting modern distributed applications. In this tutorial, we’ve seen how, with just a few minor changes to your application code to propagate tracing headers, Istio makes distributed tracing simple to use. We’ve also reviewed &lt;a href=&#34;https://skywalking.apache.org/&#34;&gt;Apache SkyWalking&lt;/a&gt; as one of the best distributed tracing systems that Istio supports. It is a fully functional platform for cloud native application analytics, with features such as metrics and log collection, alerting, Kubernetes monitoring, &lt;a href=&#34;https://skywalking.apache.org/blog/diagnose-service-mesh-network-performance-with-ebpf/&#34;&gt;service mesh performance diagnosis using eBPF&lt;/a&gt;, and more.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;If you’re new to service mesh and Kubernetes security, we have a bunch of free online courses &lt;a href=&#34;https://tetr8.io/academy&#34;&gt;available at Tetrate Academy&lt;/a&gt; that will quickly get you up to speed with Istio and Envoy.&lt;/p&gt;
&lt;p&gt;If you’re looking for a fast way to get to production with Istio, check out &lt;a href=&#34;https://tetr8.io/tid&#34;&gt;Tetrate Istio Distribution (TID)&lt;/a&gt;. TID is Tetrate’s hardened, fully upstream Istio distribution, with FIPS-verified builds and support available. It’s a great way to get started with Istio knowing you have a trusted distribution to begin with, have an expert team supporting you, and also have the option to get to FIPS compliance quickly if you need to.&lt;/p&gt;
&lt;p&gt;Once you have Istio up and running, you will probably need simpler ways to manage and secure your services beyond what’s available in Istio, that’s where Tetrate Service Bridge comes in. You can learn more about how Tetrate Service Bridge makes service mesh more secure, manageable, and resilient &lt;a href=&#34;https://tetr8.io/tsb&#34;&gt;here&lt;/a&gt;, or &lt;a href=&#34;https://tetr8.io/contact&#34;&gt;contact us for a quick demo&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Zh: 如何在 Istio 中使用 SkyWalking 进行分布式追踪？</title>
      <link>/zh/how-to-use-skywalking-for-distributed-tracing-in-istio/</link>
      <pubDate>Wed, 14 Dec 2022 00:00:00 +0000</pubDate>
      <guid>/zh/how-to-use-skywalking-for-distributed-tracing-in-istio/</guid>
      <description>
        
        
        &lt;p&gt;在云原生应用中，一次请求往往需要经过一系列的 API 或后台服务处理才能完成，这些服务有些是并行的，有些是串行的，而且位于不同的平台或节点。那么如何确定一次调用的经过的服务路径和节点以帮助我们进行问题排查？这时候就需要使用到分布式追踪。&lt;/p&gt;
&lt;p&gt;本文将向你介绍：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;分布式追踪的原理&lt;/li&gt;
&lt;li&gt;如何选择分布式追踪软件&lt;/li&gt;
&lt;li&gt;在 Istio 中如何使用分布式追踪&lt;/li&gt;
&lt;li&gt;以 Bookinfo 和 SkyWalking 为例说明如何查看分布式追踪数据&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;分布式追踪基础&#34;&gt;分布式追踪基础&lt;/h2&gt;
&lt;p&gt;分布式追踪是一种用来跟踪分布式系统中请求的方法，它可以帮助用户更好地理解、控制和优化分布式系统。分布式追踪中用到了两个概念：TraceID 和 SpanID。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TraceID 是一个全局唯一的 ID，用来标识一个请求的追踪信息。一个请求的所有追踪信息都属于同一个 TraceID，TraceID 在整个请求的追踪过程中都是不变的；&lt;/li&gt;
&lt;li&gt;SpanID 是一个局部唯一的 ID，用来标识一个请求在某一时刻的追踪信息。一个请求在不同的时间段会产生不同的 SpanID，SpanID 用来区分一个请求在不同时间段的追踪信息；&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;TraceID 和 SpanID 是分布式追踪的基础，它们为分布式系统中请求的追踪提供了一个统一的标识，方便用户查询、管理和分析请求的追踪信息。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f1.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;下面是分布式追踪的过程：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;当一个系统收到请求后，分布式追踪系统会为该请求分配一个 TraceID，用于串联起整个调用链；&lt;/li&gt;
&lt;li&gt;分布式追踪系统会为该请求在系统内的每一次服务调用生成一个 SpanID 和 ParentID，用于记录调用的父子关系，没有 ParentID 的 Span 将作为调用链的入口；&lt;/li&gt;
&lt;li&gt;每个服务调用过程中都要传递 TraceID 和 SpanID；&lt;/li&gt;
&lt;li&gt;在查看分布式追踪时，通过 TraceID 查询某次请求的全过程；&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;istio-如何实现分布式追踪&#34;&gt;Istio 如何实现分布式追踪&lt;/h2&gt;
&lt;p&gt;Istio 中的分布式追踪是基于数据平面中的 Envoy 代理实现的。服务请求在被劫持到 Envoy 中后，Envoy 在转发请求时会附加大量 Header，其中与分布式追踪相关的有：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;作为 TraceID：&lt;code&gt;x-request-id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;用于在 LightStep 追踪系统中建立 Span 的父子关系：&lt;code&gt;x-ot-span-context&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;用于 Zipkin，同时适用于 Jaeger、SkyWalking，详见 &lt;a href=&#34;https://github.com/openzipkin/b3-propagation&#34;&gt;b3-propagation&lt;/a&gt;：
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;x-b3-traceid&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-b3-spanid&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-b3-parentspanid&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-b3-sampled&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-b3-flags&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;b3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;用于 Datadog：
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;x-datadog-trace-id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-datadog-parent-id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;x-datadog-sampling-priority&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;用于 SkyWalking：&lt;code&gt;sw8&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;用于 AWS X-Ray：&lt;code&gt;x-amzn-trace-id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;关于这些 Header 的详细用法请参考 &lt;a href=&#34;https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers&#34;&gt;Envoy 文档&lt;/a&gt; 。&lt;/p&gt;
&lt;p&gt;Envoy 会在 Ingress Gateway 中为你产生用于追踪的 Header，不论你的应用程序使用何种语言开发，Envoy 都会将这些 Header 转发到上游集群。但是，你还要对应用程序代码做一些小的修改，才能为使用分布式追踪功能。这是因为应用程序无法自动传播这些 Header，可以在程序中集成分布式追踪的 Agent，或者在代码中手动传播这些 Header。Envoy 会将追踪数据发送到 tracer 后端处理，然后就可以在 UI 中查看追踪数据了。&lt;/p&gt;
&lt;p&gt;例如在 Bookinfo 应用中的 Productpage 服务，如果你查看它的代码可以发现，其中集成了 Jaeger 客户端库，并在 &lt;code&gt;getForwardHeaders (request)&lt;/code&gt; 方法中将 Envoy 生成的 Header 同步给对 Details 和 Reviews 服务的 HTTP 请求：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;getForwardHeaders&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;request&lt;span style=&#34;color:#1f2328&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    headers &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# 使用 Jaeger agent 获取 x-b3-* header&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    span &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; get_current_span&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    carrier &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    tracer&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;inject&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        span_context&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;span&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;context&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#6639ba&#34;&gt;format&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;Format&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;HTTP_HEADERS&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        carrier&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;carrier&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    headers&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;update&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;carrier&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# 手动处理非 x-b3-* header&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; session&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        headers&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;end-user&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; session&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    incoming_headers &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-request-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-ot-span-context&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-trace-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-parent-id&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-datadog-sampling-priority&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;traceparent&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;tracestate&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;x-cloud-trace-context&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;grpc-trace-bin&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;sw8&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;user-agent&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;cookie&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;authorization&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;jwt&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;for&lt;/span&gt; ihdr &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; incoming_headers&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        val &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; request&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;headers&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;get&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;ihdr&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cf222e&#34;&gt;if&lt;/span&gt; val &lt;span style=&#34;color:#0550ae&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;None&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            headers&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;ihdr&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; val
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;return&lt;/span&gt; headers
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;关于 Istio 中分布式追踪的常见问题请见 &lt;a href=&#34;https://istio.io/latest/zh/about/faq/#distributed-tracing&#34;&gt;Istio 文档&lt;/a&gt; 。&lt;/p&gt;
&lt;h2 id=&#34;分布式追踪系统如何选择&#34;&gt;分布式追踪系统如何选择&lt;/h2&gt;
&lt;p&gt;分布式追踪系统的原理类似，市面上也有很多这样的系统，例如 &lt;a href=&#34;https://github.com/apache/skywalking&#34;&gt;Apache SkyWalking&lt;/a&gt; 、&lt;a href=&#34;https://github.com/jaegertracing/jaeger&#34;&gt;Jaeger&lt;/a&gt; 、&lt;a href=&#34;https://github.com/openzipkin/zipkin/&#34;&gt;Zipkin&lt;/a&gt; 、&lt;a href=&#34;https://lightstep.com/&#34;&gt;LightStep&lt;/a&gt; 、&lt;a href=&#34;https://github.com/pinpoint-apm/pinpoint&#34;&gt;Pinpoint&lt;/a&gt; 等。我们将选择其中三个，从多个维度进行对比。之所以选择它们是因为：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;它们是当前最流行的开源分布式追踪系统；&lt;/li&gt;
&lt;li&gt;都是基于 OpenTracing 规范；&lt;/li&gt;
&lt;li&gt;都支持与 Istio 及 Envoy 集成；&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;类别&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Apache SkyWalking&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Jaeger&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Zipkin&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;实现方式&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;基于语言的探针、服务网格探针、eBPF agent、第三方指标库（当前支持 Zipkin）&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;基于语言的探针&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;基于语言的探针&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;数据存储&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;ES、H2、MySQL、TiDB、Sharding-sphere、BanyanDB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;ES、MySQL、Cassandra、内存&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;ES、MySQL、Cassandra、内存&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;支持语言&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Java、Rust、PHP、NodeJS、Go、Python、C++、.NET、Lua&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Java、Go、Python、NodeJS、C#、PHP、Ruby、C++&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Java、Go、Python、NodeJS、C#、PHP、Ruby、C++&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;发起者&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;个人&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Uber&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Twitter&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;治理方式&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Apache Foundation&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;CNCF&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;CNCF&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;版本&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;9.3.0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1.39.0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2.23.19&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Star 数量&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;20.9k&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;16.8k&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;15.8k&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;分布式追踪系统对比表（数据截止时间 2022-12-07）&lt;/p&gt;
&lt;p&gt;虽然 Apache SkyWalking 的 Agent 支持的语言没有 Jaeger 和 Zipkin 多，但是 SkyWalking 的实现方式更丰富，并且与 Jaeger、Zipkin 的追踪数据兼容，开发更为活跃，且为国人开发，中文资料丰富，是构建遥测平台的最佳选择之一。&lt;/p&gt;
&lt;h2 id=&#34;实验&#34;&gt;实验&lt;/h2&gt;
&lt;p&gt;参考 &lt;a href=&#34;https://istio.io/latest/docs/tasks/observability/distributed-tracing/skywalking/&#34;&gt;Istio 文档&lt;/a&gt; 来安装和配置 Apache SkyWalking。&lt;/p&gt;
&lt;h3 id=&#34;环境说明&#34;&gt;环境说明&lt;/h3&gt;
&lt;p&gt;以下是我们实验的环境：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes 1.24.5&lt;/li&gt;
&lt;li&gt;Istio 1.16&lt;/li&gt;
&lt;li&gt;SkyWalking 9.1.0&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;安装-istio&#34;&gt;安装 Istio&lt;/h3&gt;
&lt;p&gt;安装之前可以先检查下环境是否有问题:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ istioctl experimental precheck
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;✔ No issues found when checking the cluster. Istio is safe to install or upgrade!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  To get started, check out https://istio.io/latest/docs/setup/getting-started/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;然后安装 Istio 同时配置发送追踪信息的目的地为 SkyWalking：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# 初始化 Istio Operator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl operator init
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# 安装 Istio 并配置使用 SkyWalking&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f - &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;apiVersion: install.istio.io/v1alpha1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;kind: IstioOperator
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;metadata:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  namespace: istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  name: istio-with-skywalking
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;spec:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  meshConfig:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    defaultProviders:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      tracing:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      - &amp;#34;skywalking&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    enableTracing: true
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    extensionProviders:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;    - name: &amp;#34;skywalking&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;      skywalking:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;        service: tracing.istio-system.svc.cluster.local
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;        port: 11800
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;部署-apache-skywalking&#34;&gt;部署 Apache SkyWalking&lt;/h3&gt;
&lt;p&gt;Istio 1.16 支持使用 Apache SkyWalking 进行分布式追踪，执行下面的代码安装 SkyWalking：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.16/samples/addons/extras/skywalking.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;它将在 &lt;code&gt;istio-system&lt;/code&gt; 命名空间下安装：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/en/concepts-and-designs/backend-overview/&#34;&gt;SkyWalking OAP&lt;/a&gt; (Observability Analysis Platform) ：用于接收追踪数据，支持 SkyWalking 原生数据格式，Zipkin v1 和 v2 以及 Jaeger 格式。&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/en/ui/readme/&#34;&gt;UI&lt;/a&gt; ：用于查询分布式追踪数据。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;关于 SkyWalking 的详细信息请参考 &lt;a href=&#34;https://skywalking.apache.org/docs/main/v9.3.0/readme/&#34;&gt;SkyWalking 文档&lt;/a&gt; 。&lt;/p&gt;
&lt;h3 id=&#34;部署-bookinfo-应用&#34;&gt;部署 Bookinfo 应用&lt;/h3&gt;
&lt;p&gt;执行下面的命令安装 bookinfo 示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl label namespace default istio-injection&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;enabled
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;打开 SkyWalking UI：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl dashboard skywalking
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;SkyWalking 的 General Service 页面展示了 bookinfo 应用中的所有服务。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f2.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;你还可以看到实例、端点、拓扑、追踪等信息。例如下图展示了 bookinfo 应用的服务拓扑。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f3.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;SkyWalking 的追踪视图有多种显示形式，如列表、树形、表格和统计。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f4.jpg&#34; alt=&#34;img&#34;&gt;SkyWalking 通用服务追踪支持多种显示样式&lt;/p&gt;
&lt;p&gt;为了方便我们检查，将追踪的采样率设置为 100%：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f - &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;apiVersion: telemetry.istio.io/v1alpha1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;kind: Telemetry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;metadata:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  name: mesh-default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  namespace: istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;spec:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  tracing:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;  - randomSamplingPercentage: 100.00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;卸载&#34;&gt;卸载&lt;/h3&gt;
&lt;p&gt;在实验完后，执行下面的命令卸载 Istio 和 SkyWalking：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;samples/bookinfo/platform/kube/cleanup.sh
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl unintall --purge
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl delete namespace istio-system
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;bookinfo-demo-追踪信息说明&#34;&gt;Bookinfo demo 追踪信息说明&lt;/h2&gt;
&lt;p&gt;在 Apache SkyWalking UI 中导航到 General Service 分页，查看最近的 &lt;code&gt;istio-ingressgateway&lt;/code&gt; 服务的追踪信息，表视图如下所示。图中展示了此次请求所有 Span 的基本信息，点击每个 Span 可以查看详细信息。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f5.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;切换为列表视图，可以看到每个 Span 的执行顺序及持续时间，如下图所示。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f6.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;你可能会感到困惑，为什么这么简单的一个应用会产生如此多的 Span 信息？因为我们为 Pod 注入了 Envoy 代理之后，每个服务间的请求都会被 Envoy 拦截和处理，如下图所示。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f7.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;整个追踪流程如下图所示。&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;f8.svg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;图中给每一个 Span 标记了序号，并在括号里注明了耗时。为了便于说明我们将所有 Span 汇总在下面的表格中。&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;序号&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;方法&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;总耗时（ms）&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;组件耗时（ms）&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;当前服务&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;说明&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;190&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;istio-ingressgateway&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;190&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;istio-ingressgateway&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Ingress -&amp;gt; Productpage 网络传输&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;3&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;189&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;4&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;188&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;21&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;应用内部处理&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;5&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/details/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;8&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;6&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/details/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;7&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;3&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Productpage -&amp;gt; Details 网络传输&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;7&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/details/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;4&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;details&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;8&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/details/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;4&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;4&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;details&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;应用内部&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;9&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/reviews/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;159&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;10&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/reviews/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;159&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;14&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;productpage&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Productpage -&amp;gt; Reviews 网络传输&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;11&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/reviews/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;145&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;reviews&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;12&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/reviews/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;144&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;109&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;reviews&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;应用内部处理&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;13&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/ratings/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;35&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;reviews&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Outbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;14&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/ratings/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;33&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;16&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;reviews&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Reviews -&amp;gt; Ratings 网络传输&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;15&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/ratings/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;17&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;ratings&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Envoy Inbound&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;16&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;/ratings/0&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;16&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;16&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;ratings&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;应用内部处理&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;从以上信息可以发现：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;本次请求总耗时 190ms；&lt;/li&gt;
&lt;li&gt;在 Istio sidecar 模式下，每次流量在进出应用容器时都需要经过一次 Envoy 代理，每次耗时在 0 到 2 ms；&lt;/li&gt;
&lt;li&gt;在 Pod 间的网络请求耗时在 1 到 16ms 之间；&lt;/li&gt;
&lt;li&gt;将耗时做多的调用链 Ingress Gateway -&amp;gt; Productpage -&amp;gt; Reviews -&amp;gt; Ratings 上的所有耗时累计 182 ms，小于请求总耗时 190ms，这是因为数据本身有误差，以及 Span 的开始时间并不一定等于父 Span 的结束时间，如果你在 SkyWalking 的追踪页面，选择「列表」样式查看追踪数据（见图 2）可以更直观的发现这个问题；&lt;/li&gt;
&lt;li&gt;我们可以查看到最耗时的部分是 Reviews 应用，耗时 109ms，因此我们可以针对该应用进行优化；&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;总结&#34;&gt;总结&lt;/h2&gt;
&lt;p&gt;只要对应用代码稍作修改就可以在 Istio 很方便的使用分布式追踪功能。在 Istio 支持的众多分布式追踪系统中，&lt;a href=&#34;https://skywalking.apache.org/&#34;&gt;Apache SkyWalking&lt;/a&gt; 是其中的佼佼者。它不仅支持分布式追踪，还支持指标和日志收集、报警、Kubernetes 和服务网格监控，&lt;a href=&#34;https://skywalking.apache.org/zh/diagnose-service-mesh-network-performance-with-ebpf/&#34;&gt;使用 eBPF 诊断服务网格性能&lt;/a&gt; 等功能，是一个功能完备的云原生应用分析平台。本文中为了方便演示，将追踪采样率设置为了 100%，在生产使用时请根据需要调整采样策略（采样百分比），防止产生过多的追踪日志。&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;如果您不熟悉服务网格和 Kubernetes 安全性，我们在 &lt;a href=&#34;https://tetr8.io/academy&#34;&gt;Tetrate Academy&lt;/a&gt; 提供了一系列免费在线课程，可以让您快速了解 Istio 和 Envoy。&lt;/p&gt;
&lt;p&gt;如果您正在寻找一种快速将 Istio 投入生产的方法，请查看 &lt;a href=&#34;https://tetr8.io/tid&#34;&gt;Tetrate Istio Distribution (TID)&lt;/a&gt;。TID 是 Tetrate 的强化、完全上游的 Istio 发行版，具有经过 FIPS 验证的构建和支持。这是开始使用 Istio 的好方法，因为您知道您有一个值得信赖的发行版，有一个支持您的专家团队，并且如果需要，还可以选择快速获得 FIPS 合规性。&lt;/p&gt;
&lt;p&gt;一旦启动并运行 Istio，您可能需要更简单的方法来管理和保护您的服务，而不仅仅是 Istio 中可用的方法，这就是 Tetrate Service Bridge 的用武之地。您可以&lt;a href=&#34;https://tetr8.io/tsb&#34;&gt;在这里&lt;/a&gt;详细了解 Tetrate Service Bridge 如何使服务网格更安全、更易于管理和弹性，或&lt;a href=&#34;https://tetr8.io/contact&#34;&gt;联系我们进行快速演示&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: How to run Apache SkyWalking on AWS EKS and RDS/Aurora</title>
      <link>/blog/2022-12-13-how-to-run-apache-skywalking-on-aws-eks-rds/</link>
      <pubDate>Tue, 13 Dec 2022 00:00:00 +0000</pubDate>
      <guid>/blog/2022-12-13-how-to-run-apache-skywalking-on-aws-eks-rds/</guid>
      <description>
        
        
        &lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Apache SkyWalking is an open source APM tool for monitoring and troubleshooting distributed systems,
especially designed for microservices, cloud native and container-based (Docker, Kubernetes, Mesos)
architectures. It provides distributed tracing, service mesh observability, metric aggregation and
visualization, and alarm.&lt;/p&gt;
&lt;p&gt;In this article, I will introduce how to quickly set up Apache SkyWalking on AWS EKS and RDS/Aurora,
as well as a couple of sample services, monitoring services to observe SkyWalking itself.&lt;/p&gt;
&lt;h2 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;AWS account&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html&#34;&gt;AWS CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.terraform.io/downloads.html&#34;&gt;Terraform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://kubernetes.io/docs/tasks/tools/#kubectl&#34;&gt;kubectl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can use the AWS web console or CLI to create all resources needed in this tutorial, but it can be
too tedious and hard to debug when something goes wrong. So in this artical I will use Terraform to
create all AWS resources, deploy SkyWalking, sample services, and load generator services (Locust).&lt;/p&gt;
&lt;h2 id=&#34;architecture&#34;&gt;Architecture&lt;/h2&gt;
&lt;p&gt;The demo architecture is as follows:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-mermaid&#34; data-lang=&#34;mermaid&#34;&gt;graph LR
    subgraph AWS
        subgraph EKS
          subgraph istio-system namespace
              direction TB
              OAP[[SkyWalking OAP]]
              UI[[SkyWalking UI]]
            Istio[[istiod]]
          end
          subgraph sample namespace
              Service0[[Service0]]
              Service1[[Service1]]
              ServiceN[[Service ...]]
          end
          subgraph locust namespace
              LocustMaster[[Locust Master]]
              LocustWorkers0[[Locust Worker 0]]
              LocustWorkers1[[Locust Worker 1]]
              LocustWorkersN[[Locust Worker ...]]
          end
        end
        RDS[[RDS/Aurora]]
    end
    OAP --&amp;gt; RDS
    Service0 -. telemetry data -.-&amp;gt; OAP
    Service1 -. telemetry data -.-&amp;gt; OAP
    ServiceN -. telemetry data -.-&amp;gt; OAP
    UI --query--&amp;gt; OAP
    LocustWorkers0 -- traffic --&amp;gt; Service0
    LocustWorkers1 -- traffic --&amp;gt; Service0
    LocustWorkersN -- traffic --&amp;gt; Service0
    Service0 --&amp;gt; Service1 --&amp;gt; ServiceN
    LocustMaster --&amp;gt; LocustWorkers0
    LocustMaster --&amp;gt; LocustWorkers1
    LocustMaster --&amp;gt; LocustWorkersN
    User --&amp;gt; LocustMaster
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As shown in the architecture diagram, we need to create the following AWS resources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;EKS cluster&lt;/li&gt;
&lt;li&gt;RDS instance or Aurora cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sounds simple, but there are a lot of things behind the scenes, such as VPC, subnets, security groups, etc.
You have to configure them correctly to make sure the EKS cluster can connect to RDS instance/Aurora cluster
otherwise the SkyWalking won&amp;rsquo;t work. Luckily, Terraform can help us to create and destroy all these resources
automatically.&lt;/p&gt;
&lt;p&gt;I have created a Terraform module to create all AWS resources needed in this tutorial, you can find it in the
&lt;a href=&#34;https://github.com/kezhenxu94/oap-load-test/tree/main/aws&#34;&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;create-aws-resources&#34;&gt;Create AWS resources&lt;/h2&gt;
&lt;p&gt;First, we need to clone the GitHub repository and &lt;code&gt;cd&lt;/code&gt; into the folder:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git clone https://github.com/kezhenxu94/oap-load-test.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then, we need to create a file named &lt;code&gt;terraform.tfvars&lt;/code&gt; to specify the AWS region and other variables:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cat &amp;gt; terraform.tfvars &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;aws_access_key = &amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;aws_secret_key = &amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;cluster_name   = &amp;#34;skywalking-on-aws&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;region         = &amp;#34;ap-east-1&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;db_type        = &amp;#34;rds-postgresql&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you have already configured the AWS CLI, you can skip the &lt;code&gt;aws_access_key&lt;/code&gt; and &lt;code&gt;aws_secret_key&lt;/code&gt; variables.
To install SkyWalking with RDS postgresql, set the &lt;code&gt;db_type&lt;/code&gt; to &lt;code&gt;rds-postgresql&lt;/code&gt;, to install SkyWalking with
Aurora postgresql, set the &lt;code&gt;db_type&lt;/code&gt; to &lt;code&gt;aurora-postgresql&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;There are a lot of other variables you can configure, such as tags, sample services count, replicas, etc.,
you can find them in the &lt;a href=&#34;https://github.com/kezhenxu94/oap-load-test/blob/main/aws/variables.tf&#34;&gt;variables.tf&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then, we can run the following commands to initialize the Terraform module and download the required providers,
then create all AWS resources:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform init
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform apply -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Type &lt;code&gt;yes&lt;/code&gt; to confirm the creation of all AWS resources, or add the &lt;code&gt;-auto-approve&lt;/code&gt; flag to the &lt;code&gt;terraform apply&lt;/code&gt;
to skip the confirmation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform apply -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars -auto-approve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now what you need to do is to wait for the creation of all AWS resources to complete, it may take a few minutes.
You can check the progress of the creation in the AWS web console, and check the deployment progress of the services
inside the EKS cluster.&lt;/p&gt;
&lt;h2 id=&#34;generate-traffic&#34;&gt;Generate traffic&lt;/h2&gt;
&lt;p&gt;Besides creating necessary AWS resources, the Terraform module also deploys SkyWalking, sample services, and Locust
load generator services to the EKS cluster.&lt;/p&gt;
&lt;p&gt;You can access the Locust web UI to generate traffic to the sample services:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;open http://&lt;span style=&#34;color:#cf222e&#34;&gt;$(&lt;/span&gt;kubectl get svc -n locust -l &lt;span style=&#34;color:#953800&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;locust-master -o &lt;span style=&#34;color:#953800&#34;&gt;jsonpath&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;{.items[0].status.loadBalancer.ingress[0].hostname}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;)&lt;/span&gt;:8089
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The command opens the browser to the Locust web UI, you can configure the number of users and hatch rate to generate
traffic.&lt;/p&gt;
&lt;h2 id=&#34;observe-skywalking&#34;&gt;Observe SkyWalking&lt;/h2&gt;
&lt;p&gt;You can access the SkyWalking web UI to observe the sample services.&lt;/p&gt;
&lt;p&gt;First you need to forward the SkyWalking UI port to local&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl -n istio-system port-forward &lt;span style=&#34;color:#cf222e&#34;&gt;$(&lt;/span&gt;kubectl -n istio-system get pod -l &lt;span style=&#34;color:#953800&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;skywalking -l &lt;span style=&#34;color:#953800&#34;&gt;component&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;ui -o name&lt;span style=&#34;color:#cf222e&#34;&gt;)&lt;/span&gt; 8080:8080
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And then open the browser to http://localhost:8080 to access the SkyWalking web UI.&lt;/p&gt;
&lt;h2 id=&#34;observe-rdsaurora&#34;&gt;Observe RDS/Aurora&lt;/h2&gt;
&lt;p&gt;You can also access the RDS/Aurora web console to observe the performance of RDS/Aurora instance/Aurora cluste.&lt;/p&gt;
&lt;h2 id=&#34;test-results&#34;&gt;Test Results&lt;/h2&gt;
&lt;h3 id=&#34;test-1-skywalking-with-eks-and-rds-postgresql&#34;&gt;Test 1: SkyWalking with EKS and RDS PostgreSQL&lt;/h3&gt;
&lt;h4 id=&#34;service-traffic&#34;&gt;Service Traffic&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-cpm-locust.png&#34; alt=&#34;Service Traffic Locust&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-cpm.png&#34; alt=&#34;Service Traffic SW&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;rds-performance&#34;&gt;RDS Performance&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-postgresql-1.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-postgresql-2.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-postgresql-3.png&#34; alt=&#34;RDS Performance&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;skywalking-performance&#34;&gt;SkyWalking Performance&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-so11y-1.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-2.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-3.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-4.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-5.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;test-2-skywalking-with-eks-and-aurora-postgresql&#34;&gt;Test 2: SkyWalking with EKS and Aurora PostgreSQL&lt;/h3&gt;
&lt;h4 id=&#34;service-traffic-1&#34;&gt;Service Traffic&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-cpm-locust.png&#34; alt=&#34;Service Traffic Locust&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-cpm-skywalking.png&#34; alt=&#34;Service Traffic SW&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;rds-performance-1&#34;&gt;RDS Performance&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-postgresql-1.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-postgresql-2.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-postgresql-3.png&#34; alt=&#34;RDS Performance&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;skywalking-performance-1&#34;&gt;SkyWalking Performance&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-so11y-1.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-2.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-3.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-4.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-5.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;clean-up&#34;&gt;Clean up&lt;/h2&gt;
&lt;p&gt;When you are done with the demo, you can run the following command to destroy all AWS resources:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform destroy -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars -auto-approve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
      </description>
    </item>
    
    <item>
      <title>Zh: 如何在 AWS EKS 和 RDS/Aurora 上运行 Apache SkyWalking</title>
      <link>/zh/2022-12-13-how-to-run-apache-skywalking-on-aws-eks-rds/</link>
      <pubDate>Tue, 13 Dec 2022 00:00:00 +0000</pubDate>
      <guid>/zh/2022-12-13-how-to-run-apache-skywalking-on-aws-eks-rds/</guid>
      <description>
        
        
        &lt;h2 id=&#34;介绍&#34;&gt;介绍&lt;/h2&gt;
&lt;p&gt;Apache SkyWalking 是一个开源的 APM 工具，用于监控分布式系统和排除故障，特别是为微服务、云原生和基于容器（Docker、Kubernetes、Mesos）的架构而设计。它提供分布式跟踪、服务网格可观测性、指标聚合和可视化以及警报。&lt;/p&gt;
&lt;p&gt;在本文中，我将介绍如何在 AWS EKS 和 RDS/Aurora 上快速设置 Apache SkyWalking，以及几个示例服务，监控服务以观察 SkyWalking 本身。&lt;/p&gt;
&lt;h2 id=&#34;先决条件&#34;&gt;先决条件&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;AWS 账号&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html&#34;&gt;AWS CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.terraform.io/downloads.html&#34;&gt;Terraform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://kubernetes.io/docs/tasks/tools/#kubectl&#34;&gt;kubectl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;我们可以使用 AWS Web 控制台或 CLI 来创建本教程所需的所有资源，但是当出现问题时，它可能过于繁琐且难以调试。因此，在本文中，我将使用 Terraform 创建所有 AWS 资源、部署 SkyWalking、示例服务和负载生成器服务 (Locust)。&lt;/p&gt;
&lt;h2 id=&#34;架构&#34;&gt;架构&lt;/h2&gt;
&lt;p&gt;演示架构如下：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-mermaid&#34; data-lang=&#34;mermaid&#34;&gt;graph LR
    subgraph AWS
        subgraph EKS
          subgraph istio-system namespace
              direction TB
              OAP[[SkyWalking OAP]]
              UI[[SkyWalking UI]]
            Istio[[istiod]]
          end
          subgraph sample namespace
              Service0[[Service0]]
              Service1[[Service1]]
              ServiceN[[Service ...]]
          end
          subgraph locust namespace
              LocustMaster[[Locust Master]]
              LocustWorkers0[[Locust Worker 0]]
              LocustWorkers1[[Locust Worker 1]]
              LocustWorkersN[[Locust Worker ...]]
          end
        end
        RDS[[RDS/Aurora]]
    end
    OAP --&amp;gt; RDS
    Service0 -. telemetry data -.-&amp;gt; OAP
    Service1 -. telemetry data -.-&amp;gt; OAP
    ServiceN -. telemetry data -.-&amp;gt; OAP
    UI --query--&amp;gt; OAP
    LocustWorkers0 -- traffic --&amp;gt; Service0
    LocustWorkers1 -- traffic --&amp;gt; Service0
    LocustWorkersN -- traffic --&amp;gt; Service0
    Service0 --&amp;gt; Service1 --&amp;gt; ServiceN
    LocustMaster --&amp;gt; LocustWorkers0
    LocustMaster --&amp;gt; LocustWorkers1
    LocustMaster --&amp;gt; LocustWorkersN
    User --&amp;gt; LocustMaster
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;如架构图所示，我们需要创建以下 AWS 资源：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;EKS 集群&lt;/li&gt;
&lt;li&gt;RDS 实例或 Aurora 集群&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;听起来很简单，但背后有很多东西，比如 VPC、子网、安全组等。你必须正确配置它们以确保 EKS 集群可以连接到 RDS 实例 / Aurora 集群，否则 SkyWalking 不会不工作。幸运的是，Terraform 可以帮助我们自动创建和销毁所有这些资源。&lt;/p&gt;
&lt;p&gt;我创建了一个 Terraform 模块来创建本教程所需的所有 AWS 资源，您可以在 &lt;a href=&#34;https://github.com/kezhenxu94/oap-load-test/tree/main/aws&#34;&gt;GitHub 存储库&lt;/a&gt;中找到它。&lt;/p&gt;
&lt;h2 id=&#34;创建-aws-资源&#34;&gt;创建 AWS 资源&lt;/h2&gt;
&lt;p&gt;首先，我们需要将 GitHub 存储库克隆 &lt;code&gt;cd&lt;/code&gt; 到文件夹中：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git clone https://github.com/kezhenxu94/oap-load-test.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;然后，我们需要创建一个文件 &lt;code&gt;terraform.tfvars&lt;/code&gt; 来指定 AWS 区域和其他变量：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cat &amp;gt; terraform.tfvars &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;lt;&amp;lt;EOF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;aws_access_key = &amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;aws_secret_key = &amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;cluster_name   = &amp;#34;skywalking-on-aws&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;region         = &amp;#34;ap-east-1&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;db_type        = &amp;#34;rds-postgresql&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;如果您已经配置了 AWS CLI，则可以跳过 &lt;code&gt;aws_access_key&lt;/code&gt; 和 &lt;code&gt;aws_secret_key&lt;/code&gt; 变量。要使用 RDS postgresql 安装 SkyWalking，请将 &lt;code&gt;db_type&lt;/code&gt; 设置为 &lt;code&gt;rds-postgresql&lt;/code&gt;，要使用 Aurora postgresql 安装 SkyWalking，请将 &lt;code&gt;db_type&lt;/code&gt; 设置为 &lt;code&gt;aurora-postgresql&lt;/code&gt;。&lt;/p&gt;
&lt;p&gt;您可以配置许多其他变量，例如标签、示例服务计数、副本等，您可以在 variables.tf 中找到&lt;a href=&#34;https://github.com/kezhenxu94/oap-load-test/blob/main/aws/variables.tf&#34;&gt;它们&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;然后，我们可以运行以下命令来初始化 Terraform 模块并下载所需的提供程序，然后创建所有 AWS 资源：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform init
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform apply -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;键入 &lt;code&gt;yes&lt;/code&gt; 以确认所有 AWS 资源的创建，或将标志 &lt;code&gt;-auto-approve&lt;/code&gt; 添加到 &lt;code&gt;terraform apply&lt;/code&gt; 以跳过确认：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform apply -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars -auto-approve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;现在你需要做的就是等待所有 AWS 资源的创建完成，这可能需要几分钟的时间。您可以在 AWS Web 控制台查看创建进度，也可以查看 EKS 集群内部服务的部署进度。&lt;/p&gt;
&lt;h2 id=&#34;产生流量&#34;&gt;产生流量&lt;/h2&gt;
&lt;p&gt;除了创建必要的 AWS 资源外，Terraform 模块还将 SkyWalking、示例服务和 Locust 负载生成器服务部署到 EKS 集群。&lt;/p&gt;
&lt;p&gt;您可以访问 Locust Web UI 以生成到示例服务的流量：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;open http://&lt;span style=&#34;color:#cf222e&#34;&gt;$(&lt;/span&gt;kubectl get svc -n locust -l &lt;span style=&#34;color:#953800&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;locust-master -o &lt;span style=&#34;color:#953800&#34;&gt;jsonpath&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;{.items[0].status.loadBalancer.ingress[0].hostname}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;)&lt;/span&gt;:8089
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;该命令将浏览器打开到 Locust web UI，您可以配置用户数量和孵化率以生成流量。&lt;/p&gt;
&lt;h2 id=&#34;观察-skywalking&#34;&gt;观察 SkyWalking&lt;/h2&gt;
&lt;p&gt;您可以访问 SkyWalking Web UI 来观察示例服务。&lt;/p&gt;
&lt;p&gt;首先需要将 SkyWalking UI 端口转发到本地：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl -n istio-system port-forward &lt;span style=&#34;color:#cf222e&#34;&gt;$(&lt;/span&gt;kubectl -n istio-system get pod -l &lt;span style=&#34;color:#953800&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;skywalking -l &lt;span style=&#34;color:#953800&#34;&gt;component&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;ui -o name&lt;span style=&#34;color:#cf222e&#34;&gt;)&lt;/span&gt; 8080:8080
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;然后在浏览器中打开 http://localhost:8080 访问 SkyWalking web UI。&lt;/p&gt;
&lt;h2 id=&#34;观察-rdsaurora&#34;&gt;观察 RDS/Aurora&lt;/h2&gt;
&lt;p&gt;您也可以访问 RDS/Aurora web 控制台，观察 RDS/Aurora 实例 / Aurora 集群的性能。&lt;/p&gt;
&lt;h2 id=&#34;试验结果&#34;&gt;试验结果&lt;/h2&gt;
&lt;h3 id=&#34;测试-1使用-eks-和-rds-postgresql-的-skywalking&#34;&gt;测试 1：使用 EKS 和 RDS PostgreSQL 的 SkyWalking&lt;/h3&gt;
&lt;h4 id=&#34;服务流量&#34;&gt;服务流量&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-cpm-locust.png&#34; alt=&#34;Service Traffic Locust&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-cpm.png&#34; alt=&#34;Service Traffic SW&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;rds-性能&#34;&gt;RDS 性能&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-postgresql-1.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-postgresql-2.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-postgresql-3.png&#34; alt=&#34;RDS Performance&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;skywalking-性能&#34;&gt;SkyWalking 性能&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/postgresql/test1-so11y-1.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-2.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-3.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-4.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/postgresql/test1-so11y-5.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;测试-2使用-eks-和-aurora-postgresql-的-skywalking&#34;&gt;测试 2：使用 EKS 和 Aurora PostgreSQL 的 SkyWalking&lt;/h3&gt;
&lt;h4 id=&#34;服务流量-1&#34;&gt;服务流量&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-cpm-locust.png&#34; alt=&#34;Service Traffic Locust&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-cpm-skywalking.png&#34; alt=&#34;Service Traffic SW&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;rds-性能-1&#34;&gt;RDS 性能&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-postgresql-1.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-postgresql-2.png&#34; alt=&#34;RDS Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-postgresql-3.png&#34; alt=&#34;RDS Performance&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;skywalking-性能-1&#34;&gt;SkyWalking 性能&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;./outputs/aurora/test1-so11y-1.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-2.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-3.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-4.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;
&lt;img src=&#34;./outputs/aurora/test1-so11y-5.png&#34; alt=&#34;SkyWalking Performance&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;清理&#34;&gt;清理&lt;/h2&gt;
&lt;p&gt;完成演示后，您可以运行以下命令销毁所有 AWS 资源：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;terraform destroy -var-file&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;terraform.tfvars -auto-approve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
      </description>
    </item>
    
    <item>
      <title>Blog: Pinpoint Service Mesh Critical Performance Impact by using eBPF</title>
      <link>/blog/2022-07-05-pinpoint-service-mesh-critical-performance-impact-by-using-ebpf/</link>
      <pubDate>Tue, 05 Jul 2022 00:00:00 +0000</pubDate>
      <guid>/blog/2022-07-05-pinpoint-service-mesh-critical-performance-impact-by-using-ebpf/</guid>
      <description>
        
        
        &lt;h3 id=&#34;content&#34;&gt;Content&lt;/h3&gt;
&lt;h1 id=&#34;background&#34;&gt;Background&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://skywalking.apache.org/&#34;&gt;Apache SkyWalking&lt;/a&gt; observes metrics, logs, traces, and events for services deployed into the service mesh. When troubleshooting, SkyWalking error analysis can be an invaluable tool helping to pinpoint where an error occurred. However, performance problems are more difficult: It’s often impossible to locate the root cause of performance problems with pre-existing observation data. To move beyond the status quo, dynamic debugging and troubleshooting are essential service performance tools. In this article, we&amp;rsquo;ll discuss how to use eBPF technology to improve the profiling feature in SkyWalking and analyze the performance impact in the service mesh.&lt;/p&gt;
&lt;h1 id=&#34;trace-profiling-in-skywalking&#34;&gt;Trace Profiling in SkyWalking&lt;/h1&gt;
&lt;p&gt;Since SkyWalking 7.0.0, Trace Profiling has helped developers find performance problems by periodically sampling the thread stack to let developers know which lines of code take more time. However, Trace Profiling is not suitable for the following scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Thread Model&lt;/strong&gt;: Trace Profiling is most useful for profiling code that executes in a single thread. It is less useful for middleware that relies heavily on async execution models. For example Goroutines in Go or Kotlin Coroutines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Language&lt;/strong&gt;: Currently, Trace Profiling is only supported in Java and Python, since it’s not easy to obtain the thread stack in the runtimes of some languages such as Go and Node.js.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Binding&lt;/strong&gt;: Trace Profiling requires Agent installation, which can be tricky depending on the language (e.g., PHP has to rely on its C kernel; Rust and C/C++ require manual instrumentation to make install).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace Correlation&lt;/strong&gt;: Since Trace Profiling is only associated with a single request it can be hard to determine which request is causing the problem.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Short Lifecycle Services&lt;/strong&gt;: Trace Profiling doesn&amp;rsquo;t support short-lived services for (at least) two reasons:
&lt;ol&gt;
&lt;li&gt;It&amp;rsquo;s hard to differentiate system performance from class code manipulation in the booting stage.&lt;/li&gt;
&lt;li&gt;Trace profiling is linked to an endpoint to identify performance impact, but there is no endpoint to match these short-lived services.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Fortunately, there are techniques that can go further than Trace Profiling in these situations.&lt;/p&gt;
&lt;h1 id=&#34;introduce-ebpf&#34;&gt;Introduce eBPF&lt;/h1&gt;
&lt;p&gt;We have found that eBPF — a technology that can run sandboxed programs in an operating system kernel and thus safely and efficiently extend the capabilities of the kernel without requiring kernel modifications or loading kernel modules — can help us fill gaps left by Trace Profiling. eBPF is a trending technology because it breaks the traditional barrier between user and kernel space. Programs can now inject bytecode that runs in the kernel, instead of having to recompile the kernel to customize it. This is naturally a good fit for observability.&lt;/p&gt;
&lt;p&gt;In the figure below, we can see that when the system executes the execve syscalls, the eBPF program is triggered, and the current process runtime information is obtained by using function calls.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;eBPF-hook-points.png&#34; alt=&#34;eBPF Hook Point&#34;&gt;&lt;/p&gt;
&lt;p&gt;Using eBPF technology, we can expand the scope of Skywalking&amp;rsquo;s profiling capabilities:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Global Performance Analysis&lt;/strong&gt;: Before eBPF, data collection was limited to what agents can observe. Since eBPF programs run in the kernel, they can observe all threads. This is especially useful when you are not sure whether a performance problem is caused by a particular request.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Content&lt;/strong&gt;: eBPF can dump both user and kernel space thread stacks, so if a performance issue happens in kernel space, it’s easier to find.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Binding&lt;/strong&gt;: All modern Linux kernels support eBPF, so there is no need to install anything. This means it is an orchestration-free vs an agent model. This reduces friction caused by built-in software which may not have the correct agents installed, such as Envoy in a Service Mesh.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sampling Type&lt;/strong&gt;: Unlike Trace Profiling, eBPF is event-driven and, therefore, not constrained by interval polling. For example, eBPF can trigger events and collect more data depending on a transfer size threshold. This can allow the system to triage and prioritize data collection under extreme load.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;ebpf-limitations&#34;&gt;eBPF Limitations&lt;/h2&gt;
&lt;p&gt;While eBPF offers significant advantages for hunting performance bottlenecks, no technology is perfect. eBPF has a number of limitations described below. Fortunately, since SkyWalking does not require eBPF, the impact is limited.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Linux Version Requirement&lt;/strong&gt;: eBPF programs require a Linux kernel version above 4.4, with later kernel versions offering more data to be collected. The BCC has &lt;a href=&#34;https://github.com/iovisor/bcc/blob/13b5563c11f7722a61a17c6ca0a1a387d2fa7788/docs/kernel-versions.md#main-features&#34;&gt;documented the features supported by different Linux kernel versions&lt;/a&gt;, with the differences between versions usually being what data can be collected with eBPF.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Privileges Required&lt;/strong&gt;: All processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode. As such, bugs or other issues in such code may have a big impact.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Weak Support for Dynamic Language&lt;/strong&gt;: eBPF has weak support for JIT-based dynamic languages, such as Java. It also depends on what data you want to collect. For Profiling, eBPF does not support parsing the symbols of the program, which is why most eBPF-based profiling technologies only support static languages like C, C++, Go, and Rust. However, symbol mapping can sometimes be solved through tools provided by the language. For example, in Java, &lt;a href=&#34;https://github.com/jvm-profiling-tools/perf-map-agent#architecture&#34;&gt;perf-map-agent&lt;/a&gt; can be used to generate the symbol mapping. However, dynamic languages don&amp;rsquo;t support the attach (uprobe) functionality that would allow us to trace execution events through symbols.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;introducing-skywalking-rover&#34;&gt;Introducing SkyWalking Rover&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/skywalking-rover&#34;&gt;SkyWalking Rover&lt;/a&gt; introduces the eBPF profiling feature into the SkyWalking ecosystem. The figure below shows the overall architecture of SkyWalking Rover. SkyWalking Rover is currently supported in Kubernetes environments and must be deployed inside a Kubernetes cluster. After establishing a connection with the SkyWalking backend server, it saves information about the processes on the current machine to SkyWalking. When the user creates an eBPF profiling task via the user interface, SkyWalking Rover receives the task and executes it in the relevant C, C++, Golang, and Rust language-based programs.&lt;/p&gt;
&lt;p&gt;Other than an eBPF-capable kernel, there are no additional prerequisites for deploying SkyWalking Rover.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;architecture.png&#34; alt=&#34;architecture&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;cpu-profiling-with-rover&#34;&gt;CPU Profiling with Rover&lt;/h2&gt;
&lt;p&gt;CPU profiling is the most intuitive way to show service performance. Inspired by &lt;a href=&#34;https://www.brendangregg.com/offcpuanalysis.html&#34;&gt;Brendan Gregg‘s blog post&lt;/a&gt;, we&amp;rsquo;ve divided CPU profiling into two types that we have implemented in Rover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;On-CPU Profiling&lt;/strong&gt;: Where threads are spending time running on-CPU.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Off-CPU Profiling&lt;/strong&gt;: Where time is spent waiting while blocked on I/O, locks, timers, paging/swapping, etc.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1 id=&#34;profiling-envoy-with-ebpf&#34;&gt;Profiling Envoy with eBPF&lt;/h1&gt;
&lt;p&gt;Envoy is a popular proxy, used as the data plane by the Istio service mesh. In a Kubernetes cluster, Istio injects Envoy into each service’s pod as a sidecar where it transparently intercepts and processes incoming and outgoing traffic. As the data plane, any performance issues in Envoy can affect all service traffic in the mesh. In this scenario, it’s more powerful to use &lt;strong&gt;eBPF profiling&lt;/strong&gt; to analyze issues in production caused by service mesh configuration.&lt;/p&gt;
&lt;h2 id=&#34;demo-environment&#34;&gt;Demo Environment&lt;/h2&gt;
&lt;p&gt;If you want to see this scenario in action, we&amp;rsquo;ve built a demo environment where we deploy an Nginx service for stress testing. Traffic is intercepted by Envoy and forwarded to Nginx. The commands to install the whole environment can be accessed through &lt;a href=&#34;https://github.com/mrproliu/skywalking-rover-profiling-demo&#34;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;on-cpu-profiling&#34;&gt;On-CPU Profiling&lt;/h1&gt;
&lt;p&gt;On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high. If the stack is dumped more times, it means that the thread stack occupies more CPU resources.&lt;/p&gt;
&lt;p&gt;When installing Istio using the demo configuration profile, we found there are two places where we can optimize performance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Zipkin Tracing&lt;/strong&gt;: Different Zipkin sampling percentages have a direct impact on QPS.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Access Log Format&lt;/strong&gt;: Reducing the fields of the Envoy access log can improve QPS.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;zipkin-tracing&#34;&gt;Zipkin Tracing&lt;/h2&gt;
&lt;h3 id=&#34;zipkin-with-100-sampling&#34;&gt;Zipkin with 100% sampling&lt;/h3&gt;
&lt;p&gt;In the default demo configuration profile, Envoy is using 100% sampling as default tracing policy. How does that impact the performance?&lt;/p&gt;
&lt;p&gt;As shown in the figure below, using the &lt;strong&gt;on-CPU profiling&lt;/strong&gt;, we found that it takes about &lt;strong&gt;16%&lt;/strong&gt; of the CPU overhead. At a fixed consumption of &lt;strong&gt;2 CPUs&lt;/strong&gt;, its QPS can reach &lt;strong&gt;5.7K&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;zipkin-sampling-100.png&#34; alt=&#34;Zipkin with 100% sampling&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;disable-zipkin-tracing&#34;&gt;Disable Zipkin tracing&lt;/h3&gt;
&lt;p&gt;At this point, we found that if Zipkin is not necessary, the sampling percentage can be reduced or we can even disable tracing. Based on the &lt;a href=&#34;https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#Tracing&#34;&gt;Istio documentation&lt;/a&gt;, we can disable tracing when installing the service mesh using the following command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl install -y --set &lt;span style=&#34;color:#953800&#34;&gt;profile&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;demo &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   --set &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;meshConfig.enableTracing=false&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   --set &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;meshConfig.defaultConfig.tracing.sampling=0.0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After disabling tracing, we performed on-CPU profiling again. According to the figure below, we found that Zipkin has disappeared from the flame graph. With the same &lt;strong&gt;2 CPU&lt;/strong&gt; consumption as in the previous example, the QPS reached &lt;strong&gt;9K&lt;/strong&gt;, which is an almost &lt;strong&gt;60%&lt;/strong&gt; increase.
&lt;img src=&#34;zipkin-disable-tracing.png&#34; alt=&#34;Disable Zipkin tracing&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;tracing-with-throughput&#34;&gt;Tracing with Throughput&lt;/h3&gt;
&lt;p&gt;With the same CPU usage, we&amp;rsquo;ve discovered that Envoy performance greatly improves when the tracing feature is disabled. Of course, this requires us to make trade-offs between the number of samples Zipkin collects and the desired performance of Envoy (QPS).&lt;/p&gt;
&lt;p&gt;The table below illustrates how different Zipkin sampling percentages under the same CPU usage affect QPS.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Zipkin sampling %&lt;/th&gt;
          &lt;th&gt;QPS&lt;/th&gt;
          &lt;th&gt;CPUs&lt;/th&gt;
          &lt;th&gt;Note&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;100% &lt;strong&gt;(default)&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;5.7K&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;16% used by Zipkin&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;1%&lt;/td&gt;
          &lt;td&gt;8.1K&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;0.3% used by Zipkin&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;disabled&lt;/td&gt;
          &lt;td&gt;9.2K&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;0% used by Zipkin&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;access-log-format&#34;&gt;Access Log Format&lt;/h2&gt;
&lt;h3 id=&#34;default-log-format&#34;&gt;Default Log Format&lt;/h3&gt;
&lt;p&gt;In the default demo configuration profile, &lt;a href=&#34;https://istio.io/latest/docs/tasks/observability/logs/access-log/#default-access-log-format&#34;&gt;the default Access Log format&lt;/a&gt; contains a lot of data. The flame graph below shows various functions involved in parsing the data such as request headers, response headers, and streaming the body.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;log-format-default.png&#34; alt=&#34;Default Log Format&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;simplifying-access-log-format&#34;&gt;Simplifying Access Log Format&lt;/h3&gt;
&lt;p&gt;Typically, we don’t need all the information in the access log, so we can often simplify it to get what we need. The following command simplifies the access log format to only display basic information:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl install -y --set &lt;span style=&#34;color:#953800&#34;&gt;profile&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;demo &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   --set meshConfig.accessLogFormat&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;[%START_TIME%] \&amp;#34;%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\&amp;#34; %RESPONSE_CODE%\n&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After simplifying the access log format, we found that the QPS increased from &lt;strong&gt;5.7K&lt;/strong&gt; to &lt;strong&gt;5.9K&lt;/strong&gt;. When executing the on-CPU profiling again, the CPU usage of log formatting dropped from &lt;strong&gt;2.4%&lt;/strong&gt; to &lt;strong&gt;0.7%&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Simplifying the log format helped us to improve the performance.&lt;/p&gt;
&lt;h1 id=&#34;off-cpu-profiling&#34;&gt;Off-CPU Profiling&lt;/h1&gt;
&lt;p&gt;Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage. For example, when there are too many threads in one service, using off-CPU profiling could reveal which threads spend more time context switching.&lt;/p&gt;
&lt;p&gt;We provide data aggregation in two dimensions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Switch count&lt;/strong&gt;: The number of times a thread switches context. When the thread returns to the CPU, it completes one context switch. A thread stack with a higher switch count spends more time context switching.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Switch duration&lt;/strong&gt;: The time it takes a thread to switch the context. A thread stack with a higher switch duration spends more time off-CPU.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;write-access-log&#34;&gt;Write Access Log&lt;/h2&gt;
&lt;h3 id=&#34;enable-write&#34;&gt;Enable Write&lt;/h3&gt;
&lt;p&gt;Using the same environment and settings as before in the on-CPU test, we performed off-CPU profiling. As shown below, we found that access log writes accounted for about &lt;strong&gt;28%&lt;/strong&gt; of the total context switches. The &amp;ldquo;__write&amp;rdquo; shown below also indicates that this method is the Linux kernel method.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;access-log-write-enable.png&#34; alt=&#34;Enable Write Access Log&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;disable-write&#34;&gt;Disable Write&lt;/h3&gt;
&lt;p&gt;SkyWalking implements Envoy&amp;rsquo;s Access Log Service (ALS) feature which allows us to send access logs to the SkyWalking Observability Analysis Platform (OAP) using the gRPC protocol. Even by disabling the access logging, we can still use ALS to capture/aggregate the logs. We&amp;rsquo;ve disabled writing to the access log using the following command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl install -y --set &lt;span style=&#34;color:#953800&#34;&gt;profile&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;demo --set meshConfig.accessLogFile&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After disabling the Access Log feature, we performed the off-CPU profiling. File writing entries have disappeared as shown in the figure below. Envoy throughput also increased from &lt;strong&gt;5.7K&lt;/strong&gt; to &lt;strong&gt;5.9K&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;access-log-write-disable.png&#34; alt=&#34;Disable Write Access Log&#34;&gt;&lt;/p&gt;
&lt;h1 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In this article, we&amp;rsquo;ve examined the insights Apache Skywalking&amp;rsquo;s Trace Profiling can give us and how much more can be achieved with eBPF profiling. All of these features are implemented in &lt;a href=&#34;https://github.com/apache/skywalking-rover&#34;&gt;skywalking-rover&lt;/a&gt;. In addition to on- and off-CPU profiling, you will also find the following features:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Continuous profiling&lt;/strong&gt;, helps you automatically profile without manual intervention. For example, when Rover detects that the CPU exceeds a configurable threshold, it automatically executes the on-CPU profiling task.&lt;/li&gt;
&lt;li&gt;More profiling types to enrich usage scenarios, such as network, and memory profiling.&lt;/li&gt;
&lt;/ol&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Chaos Mesh &#43; SkyWalking: Better Observability for Chaos Engineering</title>
      <link>/blog/2021-12-21-better-observability-for-chaos-engineering/</link>
      <pubDate>Tue, 21 Dec 2021 00:00:00 +0000</pubDate>
      <guid>/blog/2021-12-21-better-observability-for-chaos-engineering/</guid>
      <description>
        
        
        &lt;p&gt;&lt;img src=&#34;chaos-mesh-skywalking-banner.png&#34; alt=&#34;Chaos Mesh + SkyWalking: Better Observability for Chaos Engineering&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/chaos-mesh/chaos-mesh&#34;&gt;Chaos Mesh&lt;/a&gt; is an open-source cloud-native &lt;a href=&#34;https://en.wikipedia.org/wiki/Chaos_engineering&#34;&gt;chaos engineering&lt;/a&gt; platform. You can use Chaos Mesh to conveniently inject failures and simulate abnormalities that might occur in reality, so you can identify potential problems in your system. Chaos Mesh also offers a Chaos Dashboard which allows you to monitor the status of a chaos experiment. However, this dashboard cannot let you observe how the failures in the experiment impact the service performance of applications. This hinders us from further testing our systems and finding potential problems.&lt;/p&gt;
&lt;!--truncate--&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/skywalking&#34;&gt;Apache SkyWalking&lt;/a&gt; is an open-source application performance monitor (APM), specially designed to monitor, track, and diagnose cloud native, container-based distributed systems. It collects events that occur and then displays them on its dashboard, allowing you to observe directly the type and number of events that have occurred in your system and how different events impact the service performance.&lt;/p&gt;
&lt;p&gt;When you use SkyWalking and Chaos Mesh together during chaos experiments, you can observe how different failures impact the service performance.&lt;/p&gt;
&lt;p&gt;This tutorial will show you how to configure SkyWalking and Chaos Mesh. You’ll also learn how to leverage the two systems to monitor events and observe in real time how chaos experiments impact applications’ service performance.&lt;/p&gt;
&lt;h2 id=&#34;preparation&#34;&gt;Preparation&lt;/h2&gt;
&lt;p&gt;Before you start to use SkyWalking and Chaos Mesh, you have to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set up a SkyWalking cluster according to &lt;a href=&#34;https://github.com/apache/skywalking-kubernetes#install&#34;&gt;the SkyWalking configuration guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Deploy Chao Mesh &lt;a href=&#34;https://chaos-mesh.org/docs/production-installation-using-helm/&#34;&gt;using Helm&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Install &lt;a href=&#34;https://jmeter.apache.org/index.html&#34;&gt;JMeter&lt;/a&gt; or other Java testing tools (to increase service loads).&lt;/li&gt;
&lt;li&gt;Configure SkyWalking and Chaos Mesh according to &lt;a href=&#34;https://github.com/chaos-mesh/chaos-mesh-on-skywalking&#34;&gt;this guide&lt;/a&gt; if you just want to run a demo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, you are fully prepared, and we can cut to the chase.&lt;/p&gt;
&lt;h2 id=&#34;step-1-access-the-skywalking-cluster&#34;&gt;Step 1: Access the SkyWalking cluster&lt;/h2&gt;
&lt;p&gt;After you install the SkyWalking cluster, you can access its user interface (UI). However, no service is running at this point, so before you start monitoring, you have to add one and set the agents.&lt;/p&gt;
&lt;p&gt;In this tutorial, we take Spring Boot, a lightweight microservice framework, as an example to build a simplified demo environment.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a SkyWalking demo in Spring Boot by referring to &lt;a href=&#34;https://github.com/chaos-mesh/chaos-mesh-on-skywalking/blob/master/demo-deployment.yaml&#34;&gt;this document&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Execute the command &lt;code&gt;kubectl apply -f demo-deployment.yaml -n skywalking&lt;/code&gt; to deploy the demo.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After you finish deployment, you can observe the real-time monitoring results at the SkyWalking UI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Spring Boot and SkyWalking have the same default port number: 8080. Be careful when you configure the port forwarding; otherise, you may have port conflicts. For example, you can set Spring Boot’s port to 8079 by using a command like &lt;code&gt;kubectl port-forward svc/spring-boot-skywalking-demo 8079:8080 -n skywalking&lt;/code&gt; to avoid conflicts.&lt;/p&gt;
&lt;h2 id=&#34;step-2-deploy-skywalking-kubernetes-event-exporter&#34;&gt;Step 2: Deploy SkyWalking Kubernetes Event Exporter&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/skywalking-kubernetes-event-exporter&#34;&gt;SkyWalking Kubernetes Event Exporter&lt;/a&gt; is able to watch, filter, and send Kubernetes events into the SkyWalking backend. SkyWalking then associates the events with the system metrics and displays an overview about when and how the metrics are affected by the events.&lt;/p&gt;
&lt;p&gt;If you want to deploy SkyWalking Kubernetes Event Explorer with one line of commands, refer to &lt;a href=&#34;https://github.com/chaos-mesh/chaos-mesh-on-skywalking/blob/master/exporter-deployment.yaml&#34;&gt;this document&lt;/a&gt; to create configuration files in YAML format and then customize the parameters in the filters and exporters. Now, you can use the command &lt;code&gt;kubectl apply&lt;/code&gt; to deploy SkyWalking Kubernetes Event Explorer.&lt;/p&gt;
&lt;h2 id=&#34;step-3-use-jmeter-to-increase-service-loads&#34;&gt;Step 3: Use JMeter to increase service loads&lt;/h2&gt;
&lt;p&gt;To better observe the change in service performance, you need to increase the service loads on Spring Boot. In this tutorial, we use JMeter, a widely adopted Java testing tool, to increase the service loads.&lt;/p&gt;
&lt;p&gt;Perform a stress test on &lt;code&gt;localhost:8079&lt;/code&gt; using JMeter and add five threads to continuously increase the service loads.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;jmeter-1.png&#34; alt=&#34;JMeter Dashboard 1&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;jmeter-2.png&#34; alt=&#34;JMeter Dashboard 2&#34;&gt;&lt;/p&gt;
&lt;p&gt;Open the SkyWalking Dashboard. You can see that the access rate is 100%, and that the service loads reach about 5,300 calls per minute (CPM).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;skywalking-dashboard.png&#34; alt=&#34;SkyWalking Dashboard&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;step-4-inject-failures-via-chaos-mesh-and-observe-results&#34;&gt;Step 4: Inject failures via Chaos Mesh and observe results&lt;/h2&gt;
&lt;p&gt;After you finish the three steps above, you can use the Chaos Dashboard to simulate stress scenarios and observe the change in service performance during chaos experiments.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;chaos-dashboard-stresschaos.png&#34; alt=&#34;StressChaos on Chaos Dashboard&#34;&gt;&lt;/p&gt;
&lt;p&gt;The following sections describe how service performance varies under the stress of three chaos conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CPU load: 10%;  memory load: 128 MB&lt;/p&gt;
&lt;p&gt;The first chaos experiment simulates low CPU usage. To display when a chaos experiment starts and ends, click the switching button on the right side of the dashboard. To learn whether the experiment is Applied to the system or Recovered from the system, move your cursor onto the short, green line.&lt;/p&gt;
&lt;p&gt;During the time period between the two short, green lines, the service load decreases to 4,929 CPM, but returns to normal after the chaos experiment ends.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;cpuload-1.png&#34; alt=&#34;Test 1&#34;&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CPU load: 50%; memory load: 128 MB&lt;/p&gt;
&lt;p&gt;When the application’s CPU load increases to 50%,  the service load decreases to 4,307 CPM.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;cpuload-2.png&#34; alt=&#34;Test 2&#34;&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CPU load: 100%; memory load: 128 MB&lt;/p&gt;
&lt;p&gt;When the CPU usage is at 100%, the service load decreases to only 40% of what it would be if no chaos experiments were taking place.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;cpuload-3.png&#34; alt=&#34;Test 3&#34;&gt;&lt;/p&gt;
&lt;p&gt;Because the process scheduling under the Linux system does not allow a process to occupy the CPU all the time, the deployed Spring Boot Demo can still handle 40% of the access requests even in the extreme case of a full CPU load.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;By combining SkyWalking and Chaos Mesh, you can clearly observe when and to what extent chaos experiments affect application service performance. This combination of tools lets you observe the service performance in various extreme conditions, thus boosting your confidence in your services.&lt;/p&gt;
&lt;p&gt;Chaos Mesh has grown a lot in 2021 thanks to the unremitting efforts of all PingCAP engineers and community contributors. In order to continue to upgrade our support for our wide variety of users and learn more about users’ experience in Chaos Engineering, we’d like to invite you to take&lt;a href=&#34;https://www.surveymonkey.com/r/X77BCNM&#34;&gt; this survey&lt;/a&gt; and give us your valuable feedback.&lt;/p&gt;
&lt;p&gt;If you want to know more about Chaos Mesh, you’re welcome to join &lt;a href=&#34;https://github.com/chaos-mesh&#34;&gt;the Chaos Mesh community on GitHub&lt;/a&gt; or our &lt;a href=&#34;https://slack.cncf.io/&#34;&gt;Slack discussions&lt;/a&gt; (#project-chaos-mesh). If you find any bugs or missing features when using Chaos Mesh, you can submit your pull requests or issues to our &lt;a href=&#34;https://github.com/chaos-mesh/chaos-mesh&#34;&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Observe VM Service Meshes with Apache SkyWalking and the Envoy Access Log Service</title>
      <link>/blog/obs-service-mesh-vm-with-sw-and-als/</link>
      <pubDate>Sun, 21 Feb 2021 00:00:00 +0000</pubDate>
      <guid>/blog/obs-service-mesh-vm-with-sw-and-als/</guid>
      <description>
        
        
        &lt;p&gt;&lt;img src=&#34;stone-arch.jpg&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Origin: &lt;a href=&#34;https://thenewstack.io/observe-virtual-machine-service-meshes-with-apache-skywalking-and-the-envoy-access-log-service&#34;&gt;Observe VM Service Meshes with Apache SkyWalking and the Envoy Access Log Service - The New Stack&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/skywalking&#34;&gt;Apache SkyWalking&lt;/a&gt;: an APM (application performance monitor) system, especially
designed for microservices, cloud native, and container-based (Docker, Kubernetes, Mesos) architectures.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.envoyproxy.io/docs/envoy/latest/api-v2/service/accesslog/v2/als.proto&#34;&gt;Envoy Access Log Service&lt;/a&gt;: Access
Log Service (ALS) is an Envoy extension that emits detailed access logs of all requests going through Envoy.&lt;/p&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&#34;/blog/obs-service-mesh-with-sw-and-als&#34;&gt;previous post&lt;/a&gt;, we talked about the observability of service mesh under
Kubernetes environment, and applied it to the bookinfo application in practice. We also mentioned that, in order to map
the IP addresses into services, SkyWalking needs access to the service metadata from a Kubernetes cluster, which is not
available for services deployed in virtual machines (VMs). In this post, we will introduce a new analyzer in SkyWalking
that leverages Envoy’s metadata exchange mechanism to decouple with Kubernetes. The analyzer is designed to work in
Kubernetes environments, VM environments, and hybrid environments. If there are virtual machines in your service mesh,
you might want to try out this new analyzer for better observability, which we will demonstrate in this tutorial.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;p&gt;The mechanism of how the analyzer works is the same as what we discussed in
the &lt;a href=&#34;/blog/obs-service-mesh-with-sw-and-als&#34;&gt;previous post&lt;/a&gt;. What makes VMs different from Kubernetes is that, for VM
services, there are no places where we can fetch the metadata to map the IP addresses into services.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image1.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;The basic idea we present in this article is to carry the metadata along with Envoy’s access logs, which is called
metadata-exchange mechanism in Envoy. When Istio pilot-agent starts an Envoy proxy as a sidecar of a service, it
collects the metadata of that service from the Kubernetes platform, or a file on the VM where that service is deployed,
and injects the metadata into the bootstrap configuration of Envoy. Envoy will carry the metadata transparently when
emitting access logs to the SkyWalking receiver.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image2.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;But how does Envoy compose a piece of a complete access log that involves the client side and server side? When a
request goes out from Envoy, a plugin of istio-proxy named &amp;ldquo;metadata-exchange&amp;rdquo; injects the metadata into the http
headers (with a prefix like &lt;code&gt;x-envoy-downstream-&lt;/code&gt;), and the metadata is propagated to the server side. The Envoy sidecar
of the server side receives the request and parses the headers into metadata, and puts the metadata into the access log,
keyed by &lt;code&gt;wasm.downstream_peer&lt;/code&gt;. The server side Envoy also puts its own metadata into the access log keyed
by &lt;code&gt;wasm.upstream_peer.&lt;/code&gt; Hence the two sides of a single request are completed.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image3.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;With the metadata-exchange mechanism, we can use the metadata directly without any extra query.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image4.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;In this tutorial, we will use another demo
application &lt;a href=&#34;http://github.com/GoogleCloudPlatform/microservices-demo&#34;&gt;Online Boutique&lt;/a&gt; that consists of 10+ services so
that we can deploy some of them in VMs and make them communicate with other services deployed in Kubernetes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image5.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;Topology of Online Boutique In order to cover as many cases as possible, we will deploy &lt;code&gt;CheckoutService&lt;/code&gt;
and &lt;code&gt;PaymentService&lt;/code&gt; on VM and all the other services on Kubernetes, so that we can cover the cases like Kubernetes →
VM (e.g. &lt;code&gt;Frontend&lt;/code&gt; → &lt;code&gt;CheckoutService&lt;/code&gt;), VM → Kubernetes (e.g. &lt;code&gt;CheckoutService&lt;/code&gt; → &lt;code&gt;ShippingService&lt;/code&gt;), and VM → VM (
e.g. &lt;code&gt;CheckoutService&lt;/code&gt; → &lt;code&gt;PaymentService&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: All the commands used in this tutorial are accessible
on &lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts&#34;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git clone https://github.com/SkyAPMTest/sw-als-vm-demo-scripts
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;cd&lt;/span&gt; sw-als-vm-demo-scripts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Make sure to init the &lt;code&gt;gcloud&lt;/code&gt; SDK properly before moving on. Modify the &lt;code&gt;GCP_PROJECT&lt;/code&gt; in file &lt;code&gt;env.sh&lt;/code&gt; to your own
project name. Most of the other variables should be OK to work if you keep them intact. If you would like to
use &lt;code&gt;ISTIO_VERSION&lt;/code&gt; &amp;gt;/= 1.8.0, please make sure &lt;a href=&#34;https://github.com/istio/istio/pull/28956&#34;&gt;this patch&lt;/a&gt; is included.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Prepare Kubernetes cluster and VM instances
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/00-create-cluster-and-vms.sh&#34;&gt;&lt;code&gt;00-create-cluster-and-vms.sh&lt;/code&gt;&lt;/a&gt;
creates a new GKE cluster and 2 VM instances that will be used through the entire tutorial, and sets up some necessary
firewall rules for them to communicate with each other.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install Istio and SkyWalking
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/01a-install-istio.sh&#34;&gt;&lt;code&gt;01a-install-istio.sh&lt;/code&gt;&lt;/a&gt;
installs Istio Operator with spec &lt;code&gt;resources/vmintegration.yaml&lt;/code&gt;. In the YAML file, we enable the &lt;code&gt;meshExpansion&lt;/code&gt; that
supports VM in mesh. We also enable the Envoy access log service and specify the
address &lt;code&gt;skywalking-oap.istio-system.svc.cluster.local:11800&lt;/code&gt; to which Envoy emits the access logs.
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/01b-install-skywalking.sh&#34;&gt;&lt;code&gt;01b-install-skywalking.sh&lt;/code&gt;&lt;/a&gt;
installs Apache SkyWalking and sets the analyzer to &lt;code&gt;mx-mesh&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create files to initialize the VM
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/02-create-files-to-transfer-to-vm.sh&#34;&gt;&lt;code&gt;02-create-files-to-transfer-to-vm.sh&lt;/code&gt;&lt;/a&gt;
creates necessary files that will be used to initialize the VMs.
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/03-copy-work-files-to-vm.sh&#34;&gt;&lt;code&gt;03-copy-work-files-to-vm.sh&lt;/code&gt;&lt;/a&gt;
securely transfers the generated files to the VMs with &lt;code&gt;gcloud scp&lt;/code&gt; command. Now use &lt;code&gt;./ssh.sh checkoutservice&lt;/code&gt;
and &lt;code&gt;./ssh.sh paymentservice&lt;/code&gt; to log into the two VMs respectively, and &lt;code&gt;cd&lt;/code&gt; to the &lt;code&gt;~/work&lt;/code&gt; directory,
execute &lt;code&gt;./prep-checkoutservice.sh&lt;/code&gt; on &lt;code&gt;checkoutservice&lt;/code&gt; VM instance and &lt;code&gt;./prep-paymentservice.sh&lt;/code&gt;
on &lt;code&gt;paymentservice&lt;/code&gt; VM instance. The Istio sidecar should be installed and started properly. To verify that,
use &lt;code&gt;tail -f /var/logs/istio/istio.log&lt;/code&gt; to check the Istio logs. The output should be something like:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;2020-12-12T08:07:07.348329Z	info	sds	resource:default new connection
2020-12-12T08:07:07.348401Z	info	sds	Skipping waiting for gateway secret
2020-12-12T08:07:07.348401Z	info	sds	Skipping waiting for gateway secret
2020-12-12T08:07:07.568676Z	info	cache	Root cert has changed, start rotating root cert for SDS clients
2020-12-12T08:07:07.568718Z	info	cache	GenerateSecret default
2020-12-12T08:07:07.569398Z	info	sds	resource:default pushed key/cert pair to proxy
2020-12-12T08:07:07.949156Z	info	cache	Loaded root cert from certificate ROOTCA
2020-12-12T08:07:07.949348Z	info	sds	resource:ROOTCA pushed root cert to proxy
2020-12-12T20:12:07.384782Z	info	sds	resource:default pushed key/cert pair to proxy
2020-12-12T20:12:07.384832Z	info	sds	Dynamic push for secret default
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The dnsmasq configuration &lt;code&gt;address=/.svc.cluster.local/{ISTIO_SERVICE_IP_STUB}&lt;/code&gt; also resolves the domain names ended
with &lt;code&gt;.svc.cluster.local&lt;/code&gt; to Istio service IP, so that you are able to access the Kubernetes services in the VM by
fully qualified domain name (FQDN) such as &lt;code&gt;httpbin.default.svc.cluster.local&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploy demo application Because we want to deploy &lt;code&gt;CheckoutService&lt;/code&gt; and &lt;code&gt;PaymentService&lt;/code&gt; manually on
VM, &lt;code&gt;resources/google-demo.yaml&lt;/code&gt; removes the two services
from &lt;a href=&#34;https://github.com/GoogleCloudPlatform/microservices-demo/blob/master/release/kubernetes-manifests.yaml&#34;&gt;the original YAML&lt;/a&gt;
.
&lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/04a-deploy-demo-app.sh&#34;&gt;&lt;code&gt;04a-deploy-demo-app.sh&lt;/code&gt;&lt;/a&gt;
deploys the other services on Kubernetes. Then log into the 2 VMs, run &lt;code&gt;~/work/deploy-checkoutservice.sh&lt;/code&gt;
and &lt;code&gt;~/work/deploy-paymentservice.sh&lt;/code&gt; respectively to deploy &lt;code&gt;CheckoutService&lt;/code&gt; and &lt;code&gt;PaymentService&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Register VMs to Istio Services on VMs can access the services on Kubernetes by FQDN, but that’s not the case when the
Kubernetes services want to talk to the VM services. The mesh has no idea where to forward the requests such
as &lt;code&gt;checkoutservice.default.svc.cluster.local&lt;/code&gt; because &lt;code&gt;checkoutservice&lt;/code&gt; is isolated in the VM. Therefore, we need to
register the services to the
mesh. &lt;a href=&#34;https://github.com/SkyAPMTest/sw-als-vm-demo-scripts/blob/2179d04270c98b9f87cf3998f5af775870ed53a7/04b-register-vm-with-istio.sh&#34;&gt;&lt;code&gt;04b-register-vm-with-istio.sh&lt;/code&gt;&lt;/a&gt;
registers the VM services to the mesh by creating a &amp;ldquo;dummy&amp;rdquo; service without running Pods, and a &lt;code&gt;WorkloadEntry&lt;/code&gt; to
bridge the &amp;ldquo;dummy&amp;rdquo; service with the VM service.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;done&#34;&gt;Done!&lt;/h2&gt;
&lt;p&gt;The demo application contains a &lt;code&gt;load generator&lt;/code&gt; service that performs requests repeatedly. We only need to wait a few
seconds, and then open the SkyWalking web UI to check the results.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;export POD_NAME=$(kubectl get pods --namespace istio-system -l &amp;#34;app=skywalking,release=skywalking,component=ui&amp;#34; -o jsonpath=&amp;#34;{.items[0].metadata.name}&amp;#34;)
echo &amp;#34;Visit http://127.0.0.1:8080 to use your application&amp;#34;
kubectl port-forward $POD_NAME 8080:8080 --namespace istio-system
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Navigate the browser to http://localhost:8080 . The metrics, topology should be there.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image6.png&#34; alt=&#34;Topology&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image7.png&#34; alt=&#34;Global metrics&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image8.png&#34; alt=&#34;Metrics of CheckoutService&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;image9.png&#34; alt=&#34;Metrics of PaymentService&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;troubleshooting&#34;&gt;Troubleshooting&lt;/h2&gt;
&lt;p&gt;If you face any trouble when walking through the steps, here are some common problems and possible solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;VM service cannot access Kubernetes services? It’s likely the DNS on the VM doesn’t correctly resolve the fully
qualified domain names. Try to verify that with &lt;code&gt;nslookup istiod.istio-system.svc.cluster.local&lt;/code&gt;. If it doesn’t
resolve to the Kubernetes CIDR address, recheck the step in &lt;code&gt;prep-checkoutservice.sh&lt;/code&gt; and &lt;code&gt;prep-paymentservice.sh&lt;/code&gt;. If
the DNS works correctly, try to verify that Envoy has fetched the upstream clusters from the control plane
with &lt;code&gt;curl http://localhost:15000/clusters&lt;/code&gt;. If it doesn’t contain the target service,
recheck &lt;code&gt;prep-checkoutservice.sh&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Services are normal but nothing on SkyWalking WebUI? Check the SkyWalking OAP logs
via &lt;code&gt;kubectl -n istio-system logs -f $(kubectl get pod -A -l &amp;quot;app=skywalking,release=skywalking,component=oap&amp;quot; -o name)&lt;/code&gt;
and WebUI logs
via &lt;code&gt;kubectl -n istio-system logs -f $(kubectl get pod -A -l &amp;quot;app=skywalking,release=skywalking,component=ui&amp;quot; -o name)&lt;/code&gt;
to see whether there are any error logs . Also, make sure the time zone at the bottom-right of the browser is set
to &lt;code&gt;UTC +0&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;additional-resources&#34;&gt;Additional Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;/blog/obs-service-mesh-with-sw-and-als&#34;&gt;Observe a Service Mesh with Envoy ALS&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: Observe Service Mesh with SkyWalking and Envoy Access Log Service</title>
      <link>/blog/2020-12-03-obs-service-mesh-with-sw-and-als/</link>
      <pubDate>Thu, 03 Dec 2020 00:00:00 +0000</pubDate>
      <guid>/blog/2020-12-03-obs-service-mesh-with-sw-and-als/</guid>
      <description>
        
        
        &lt;p&gt;&lt;img src=&#34;canyonhorseshoe.jpg&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Author: Zhenxu Ke, Sheng Wu, and Tevah Platt. tetrate.io&lt;/li&gt;
&lt;li&gt;Original link, &lt;a href=&#34;https://www.tetrate.io/blog/observe-service-mesh-with-skywalking-and-envoy-access-log-service/&#34;&gt;Tetrate.io blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dec. 03th, 2020&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/skywalking&#34;&gt;Apache SkyWalking&lt;/a&gt;: an APM (application performance monitor) system, especially designed for microservices, cloud native, and container-based (Docker, Kubernetes, Mesos) architectures.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.envoyproxy.io/docs/envoy/latest/api-v2/service/accesslog/v2/als.proto&#34;&gt;Envoy Access Log Service&lt;/a&gt;: Access Log Service (ALS) is an Envoy extension that emits detailed access logs of all requests going through Envoy.&lt;/p&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;Apache SkyWalking has long supported observability in service mesh with Istio Mixer adapter. But since v1.5, Istio began to deprecate Mixer due to its poor performance in large scale clusters. Mixer’s functionalities have been moved into the Envoy proxies, and is supported only through the 1.7 Istio release.
On the other hand, &lt;a href=&#34;https://github.com/wu-sheng&#34;&gt;Sheng Wu&lt;/a&gt; and &lt;a href=&#34;https://github.com/lizan&#34;&gt;Lizan Zhou&lt;/a&gt; presented a better solution based on the Apache SkyWalking and Envoy ALS on &lt;a href=&#34;https://kccncosschn19eng.sched.com/event/NroB/observability-in-service-mesh-powered-by-envoy-and-apache-skywalking-sheng-wu-lizan-zhou-tetrate&#34;&gt;KubeCon China 2019&lt;/a&gt;,  to reduce the performance impact brought by Mixer, while retaining the same observability in service mesh. This solution was initially implemented by Sheng Wu, &lt;a href=&#34;https://github.com/hanahmily&#34;&gt;Hongtao Gao&lt;/a&gt;, Lizan Zhou, and &lt;a href=&#34;https://github.com/dio&#34;&gt;Dhi Aurrahman&lt;/a&gt; at Tetrate.io.
If you are looking for a more efficient solution to observe your service mesh instead of using a Mixer-based solution, this is exactly what you need. In this tutorial, we will explain a little bit how the new solution works, and apply it to the bookinfo application in practice.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;p&gt;From a perspective of observability, Envoy can be typically deployed in 2 modes, sidecar, and router. As a sidecar, Envoy mostly represents a single service to receive and send requests (2 and 3 in the picture below). While as a proxy, Envoy may represent many services (1 in the picture below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-2.25.17-PM.png&#34; alt=&#34;Example of Envoy deployment, as front proxy and sidecar&#34;&gt;&lt;/p&gt;
&lt;p&gt;In both modes, the logs emitted by ALS include a node identifier. The identifier starts with &lt;code&gt;router~&lt;/code&gt; (or &lt;code&gt;ingress~&lt;/code&gt;) in router mode and &lt;code&gt;sidecar~&lt;/code&gt; in sidecar proxy mode.&lt;/p&gt;
&lt;p&gt;Apart from the node identifier, there are &lt;a href=&#34;https://github.com/envoyproxy/envoy/blob/549164c42cae84b59154ca4c36009e408aa10b52/generated_api_shadow/envoy/data/accesslog/v2/accesslog.proto&#34;&gt;several noteworthy properties in the access logs&lt;/a&gt; that will be used in this solution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;downstream_direct_remote_address&lt;/code&gt;: This field is the downstream direct remote address on which the request from the user was received. Note: This is always the physical peer, even if the remote address is inferred from for example the &lt;code&gt;x-forwarded-for&lt;/code&gt; header, proxy protocol, etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;downstream_remote_address&lt;/code&gt;: The remote/origin address on which the request from the user was received.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;downstream_local_address&lt;/code&gt;: The local/destination address on which the request from the user was received.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;upstream_remote_address&lt;/code&gt;: The upstream remote/destination address that handles this exchange.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;upstream_local_address&lt;/code&gt;: The upstream local/origin address that handles this exchange.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;upstream_cluster&lt;/code&gt;: The upstream cluster that &lt;em&gt;upstream_remote_address&lt;/em&gt; belongs to.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will discuss more about the properties in the following sections.&lt;/p&gt;
&lt;h3 id=&#34;sidecar&#34;&gt;Sidecar&lt;/h3&gt;
&lt;p&gt;When serving as a sidecar, Envoy is deployed alongside a service, and delegates all the incoming/outgoing requests to/from the service.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Delegating incoming requests:&lt;/strong&gt; in this case, Envoy acts as a server side sidecar, and sets the &lt;code&gt;upstream_cluster&lt;/code&gt; in form of &lt;code&gt;inbound|portNumber|portName|Hostname[or]SidecarScopeID&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-2.37.49-PM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;The SkyWalking analyzer checks whether either &lt;code&gt;downstream_remote_address&lt;/code&gt; can be mapped to a Kubernetes service:&lt;/p&gt;
&lt;p&gt;a. If there is a service (say &lt;code&gt;Service B&lt;/code&gt;) whose implementation is running in this IP(and port), then we have a service-to-service relation, &lt;code&gt;Service B -&amp;gt; Service A&lt;/code&gt;, which can be used to build the topology. Together with the &lt;code&gt;start_time&lt;/code&gt; and &lt;code&gt;duration&lt;/code&gt; fields in the access log, we have the latency metrics now.&lt;/p&gt;
&lt;p&gt;b. If there is no service that can be mapped to &lt;code&gt;downstream_remote_address&lt;/code&gt;, then the request may come from a service out of the mesh. Since SkyWalking cannot identify the source service where the requests come from, it simply generates the metrics without source service, according to the &lt;a href=&#34;https://wu-sheng.github.io/STAM/&#34;&gt;topology analysis method&lt;/a&gt;. The topology can be built as accurately as possible, and the metrics detected from server side are still correct.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Delegating outgoing requests:&lt;/strong&gt; in this case, Envoy acts as a client-side sidecar, and sets the &lt;code&gt;upstream_cluster&lt;/code&gt; in form of &lt;code&gt;outbound|&amp;lt;port&amp;gt;|&amp;lt;subset&amp;gt;|&amp;lt;serviceFQDN&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-2.43.16-PM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;Client side detection is relatively simpler than (1. Delegating incoming requests). If &lt;code&gt;upstream_remote_address&lt;/code&gt; is another sidecar or proxy, we simply get the mapped service name and generate the topology and metrics. Otherwise, we have no idea what it is and consider it an &lt;code&gt;UNKNOWN&lt;/code&gt; service.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;proxy-role&#34;&gt;Proxy role&lt;/h3&gt;
&lt;p&gt;When Envoy is deployed as a proxy, it is an independent service itself and doesn&amp;rsquo;t represent any other service like a sidecar does. Therefore, we can build client-side metrics as well as server-side metrics.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-2.46.56-PM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;In this section, we will use the typical &lt;a href=&#34;https://istio.io/latest/docs/examples/bookinfo/&#34;&gt;bookinfo application&lt;/a&gt; to demonstrate how  Apache SkyWalking 8.3.0+ (the latest version up to Nov. 30th, 2020) works together with Envoy ALS to observe a service mesh.&lt;/p&gt;
&lt;h3 id=&#34;installing-kubernetes&#34;&gt;Installing Kubernetes&lt;/h3&gt;
&lt;p&gt;SkyWalking 8.3.0 supports the Envoy ALS solution under both Kubernetes environment and virtual machines (VM) environment, in this tutorial, we’ll only focus on the Kubernetes scenario, for VM solution, please stay tuned for our next blog, so we need to install Kubernetes before taking further steps.&lt;/p&gt;
&lt;p&gt;In this tutorial, we will use the &lt;a href=&#34;https://minikube.sigs.k8s.io/docs/&#34;&gt;Minikube&lt;/a&gt; tool to quickly set up a local Kubernetes(v1.17) cluster for testing. In order to run all the needed components, including the bookinfo application, the SkyWalking OAP and WebUI, the cluster may need up to 4GB RAM and 2 CPU cores.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;minikube start --memory&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;4096&lt;/span&gt; --cpus&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, run &lt;code&gt;kubectl get pods --namespace=kube-system --watch&lt;/code&gt; to check whether all the Kubernetes components are ready. If not, wait for the readiness before going on.&lt;/p&gt;
&lt;h3 id=&#34;installing-istio&#34;&gt;Installing Istio&lt;/h3&gt;
&lt;p&gt;Istio provides a very convenient way to configure the Envoy proxy and enable the access log service. The built-in configuration profiles free us from lots of manual operations. So, for demonstration purposes, we will use Istio through this tutorial.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#953800&#34;&gt;ISTIO_VERSION&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;1.7.1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl -L https://istio.io/downloadIstio &lt;span style=&#34;color:#1f2328&#34;&gt;|&lt;/span&gt; sh - 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sudo mv &lt;span style=&#34;color:#953800&#34;&gt;$PWD&lt;/span&gt;/istio-&lt;span style=&#34;color:#953800&#34;&gt;$ISTIO_VERSION&lt;/span&gt;/bin/istioctl /usr/local/bin/
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl  install --set &lt;span style=&#34;color:#953800&#34;&gt;profile&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;demo
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl label namespace default istio-injection&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;enabled
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Run &lt;code&gt;kubectl get pods --namespace=istio-system --watch&lt;/code&gt; to check whether all the Istio components are ready. If not, wait for the readiness before going on.&lt;/p&gt;
&lt;h3 id=&#34;enabling-als&#34;&gt;Enabling ALS&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;demo&lt;/code&gt; profile doesn’t enable ALS by default. We need to reconfigure it to enable ALS via some configuration.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl  manifest install &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set meshConfig.enableEnvoyAccessLogService&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;true&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set meshConfig.defaultConfig.envoyAccessLogService.address&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;skywalking-oap.istio-system:11800
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The example command &lt;code&gt;--set meshConfig.enableEnvoyAccessLogService=true&lt;/code&gt; enables the Envoy access log service in the mesh. And as we said earlier, ALS is essentially a gRPC service that emits requests logs. The config &lt;code&gt;meshConfig.defaultConfig.envoyAccessLogService.address=skywalking-oap.istio-system:11800&lt;/code&gt; tells this gRPC service  where to emit the logs, say &lt;code&gt;skywalking-oap.istio-system:11800&lt;/code&gt;, where we will deploy the SkyWalking ALS receiver later.&lt;/p&gt;
&lt;p&gt;NOTE:
You can also enable the ALS when installing Istio so that you don’t need to restart Istio after installation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;istioctl install --set &lt;span style=&#34;color:#953800&#34;&gt;profile&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;demo &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set meshConfig.enableEnvoyAccessLogService&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;true&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set meshConfig.defaultConfig.envoyAccessLogService.address&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;skywalking-oap.istio-system:11800
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl label namespace default istio-injection&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;enabled
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;deploying-apache-skywalking&#34;&gt;Deploying Apache SkyWalking&lt;/h3&gt;
&lt;p&gt;The SkyWalking community provides a &lt;a href=&#34;https://helm.sh&#34;&gt;Helm&lt;/a&gt; Chart to make it easier to deploy SkyWalking and its dependent services in Kubernetes. The Helm Chart can be found at the &lt;a href=&#34;https://github.com/apache/skywalking-kubernetes&#34;&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Install Helm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl -sSLO https://get.helm.sh/helm-v3.0.0-linux-amd64.tar.gz
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sudo tar xz -C /usr/local/bin --strip-components&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;1&lt;/span&gt; linux-amd64/helm -f helm-v3.0.0-linux-amd64.tar.gz
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Clone SkyWalking Helm Chart&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git clone https://github.com/apache/skywalking-kubernetes
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;cd&lt;/span&gt; skywalking-kubernetes/chart
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git reset --hard dd749f25913830c47a97430618cefc4167612e75
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Update dependencies&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm dep up skywalking
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#57606a&#34;&gt;# Deploy SkyWalking&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm -n istio-system install skywalking skywalking &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.storageType&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;h2&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set ui.image.tag&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;8.3.0 &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.image.tag&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;8.3.0-es7 &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.replicas&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.env.SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;k8s-mesh &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.env.JAVA_OPTS&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#39;-Dmode=&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set oap.envoy.als.enabled&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;true&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               --set elasticsearch.enabled&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We deploy SkyWalking to the namespace &lt;code&gt;istio-system&lt;/code&gt;, so that SkyWalking OAP service can be accessed by &lt;code&gt;skywalking-oap.istio-system:11800&lt;/code&gt;, to which we told ALS to emit their logs, in the previous step.&lt;/p&gt;
&lt;p&gt;We also enable the ALS analyzer in the SkyWalking OAP: &lt;code&gt;oap.env.SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS=k8s-mesh&lt;/code&gt;. The analyzer parses the access logs and maps the IP addresses in the logs to the real service names in the Kubernetes, to build a topology.&lt;/p&gt;
&lt;p&gt;In order to retrieve the metadata (such as Pod IP and service names) from a Kubernetes cluster for IP mappings, we also set &lt;code&gt;oap.envoy.als.enabled=true&lt;/code&gt;, to apply for a &lt;code&gt;ClusterRole&lt;/code&gt; that has access to the metadata.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#953800&#34;&gt;POD_NAME&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;$(&lt;/span&gt;kubectl get pods -A -l &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;app=skywalking,release=skywalking,component=ui&amp;#34;&lt;/span&gt; -o name&lt;span style=&#34;color:#cf222e&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;echo&lt;/span&gt; &lt;span style=&#34;color:#953800&#34;&gt;$POD_NAME&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl -n istio-system port-forward &lt;span style=&#34;color:#953800&#34;&gt;$POD_NAME&lt;/span&gt; 8080:8080
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now navigate your browser to http://localhost:8080 . You should be able to see the SkyWalking dashboard. The dashboard is empty for now, but after we deploy the demo application and generate traffic, it should be filled up later.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-3.01.03-PM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;deploying-bookinfo-application&#34;&gt;Deploying Bookinfo application&lt;/h3&gt;
&lt;p&gt;Run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f7f7f7;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#953800&#34;&gt;ISTIO_VERSION&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;1.7.1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/istio/istio/&lt;span style=&#34;color:#953800&#34;&gt;$ISTIO_VERSION&lt;/span&gt;/samples/bookinfo/platform/kube/bookinfo.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f https://raw.githubusercontent.com/istio/istio/&lt;span style=&#34;color:#953800&#34;&gt;$ISTIO_VERSION&lt;/span&gt;/samples/bookinfo/networking/bookinfo-gateway.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl &lt;span style=&#34;color:#6639ba&#34;&gt;wait&lt;/span&gt; --for&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#953800&#34;&gt;condition&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;Ready pods --all --timeout&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;1200s
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;minikube tunnel
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then navigate your browser to http://localhost/productpage. You should be able to see the typical bookinfo application. Refresh the webpage several times to generate enough access logs.&lt;/p&gt;
&lt;h3 id=&#34;done&#34;&gt;Done!&lt;/h3&gt;
&lt;p&gt;And you’re all done! Check out the SkyWalking WebUI again. You should see the topology of the bookinfo application, as well the metrics of each individual service of the bookinfo application.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Screen-Shot-2020-12-02-at-3.05.24-PM.png&#34; alt=&#34;&#34;&gt;
&lt;img src=&#34;Screen-Shot-2020-12-02-at-3.11.55-PM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;troubleshooting&#34;&gt;Troubleshooting&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Check all pods status: &lt;code&gt;kubectl get pods -A&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;SkyWalking OAP logs: &lt;code&gt;kubectl -n istio-system logs -f $(kubectl get pod -A -l &amp;quot;app=skywalking,release=skywalking,component=oap&amp;quot; -o name)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;SkyWalking WebUI logs: &lt;code&gt;kubectl -n istio-system logs -f $(kubectl get pod -A -l &amp;quot;app=skywalking,release=skywalking,component=ui&amp;quot; -o name)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Make sure the time zone at the bottom-right of the WebUI is set to &lt;code&gt;UTC +0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;customizing-service-names&#34;&gt;Customizing Service Names&lt;/h2&gt;
&lt;p&gt;The SkyWalking community brought more improvements to the ALS solution in the 8.3.0 version. You can decide how to compose the service names when mapping from the IP addresses, with variables &lt;code&gt;service&lt;/code&gt; and &lt;code&gt;pod&lt;/code&gt;. For instance, configuring &lt;code&gt;K8S_SERVICE_NAME_RULE&lt;/code&gt; to the expression &lt;code&gt;${service.metadata.name}-${pod.metadata.labels.version}&lt;/code&gt; gets service names with version label such as &lt;code&gt;reviews-v1&lt;/code&gt;, &lt;code&gt;reviews-v2&lt;/code&gt;, and &lt;code&gt;reviews-v3&lt;/code&gt;, instead of a single service &lt;code&gt;reviews&lt;/code&gt;, see &lt;a href=&#34;https://github.com/apache/skywalking/pull/5722&#34;&gt;the PR&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;working-als-with-vm&#34;&gt;Working ALS with VM&lt;/h2&gt;
&lt;p&gt;Kubernetes is popular, but what about VMs? From what we discussed above, in order to map the IPs to services, SkyWalking needs access to the Kubernetes cluster, fetching service metadata and Pod IPs. But in a VM environment, there is no source from which we can fetch those metadata.
In the next post, we will introduce another ALS analyzer based on the Envoy metadata exchange mechanism. With this analyzer, you are able to observe a service mesh in the VM environment. Stay tuned!
If you want to  have commercial support for the ALS solution or hybrid mesh observability, &lt;a href=&#34;https://www.tetrate.io/tetrate-service-bridge/&#34;&gt;Tetrate Service Bridge, TSB&lt;/a&gt; is another good option out there.&lt;/p&gt;
&lt;h2 id=&#34;additional-resources&#34;&gt;Additional Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;KubeCon 2019 Recorded &lt;a href=&#34;https://www.youtube.com/watch?v=tERm39ju9ew&#34;&gt;Video&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Get more SkyWalking updates on &lt;a href=&#34;https://skywalking.apache.org&#34;&gt;the official website&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Apache SkyWalking founder Sheng Wu, SkyWalking core maintainer Zhenxu Ke are Tetrate engineers, and Tevah Platt is a content writer for Tetrate. Tetrate helps organizations adopt open source service mesh tools, including Istio, Envoy, and Apache SkyWalking, so they can manage microservices, run service mesh on any infrastructure, and modernize their applications.&lt;/em&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: The Apdex Score for Measuring Service Mesh Health</title>
      <link>/blog/2020-07-26-apdex-and-skywalking/</link>
      <pubDate>Sun, 26 Jul 2020 00:00:00 +0000</pubDate>
      <guid>/blog/2020-07-26-apdex-and-skywalking/</guid>
      <description>
        
        
        &lt;ul&gt;
&lt;li&gt;Author: Srinivasan Ramaswamy, tetrate&lt;/li&gt;
&lt;li&gt;Original link, &lt;a href=&#34;https://www.tetrate.io/blog/the-apdex-score-for-measuring-service-mesh-health/&#34;&gt;Tetrate.io blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;asking-how-are-you-is-more-profound-than-what-are-your-symptoms&#34;&gt;Asking &lt;code&gt;How are you&lt;/code&gt; is more profound than &lt;code&gt;What are your symptoms&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src=&#34;intro_image.png&#34; alt=&#34;alt_text&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;background&#34;&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Recently I visited my preferred doctor. Whenever I visit, the doctor greets me with a series of light questions: How’s your day? How about the week before? Any recent trips? Did I break my cycling record? How’s your workout regimen? _Finally _he asks, “Do you have any problems?&amp;quot;  On those visits when I didn&amp;rsquo;t feel ok, I would say something like, &amp;ldquo;&lt;em&gt;I&amp;rsquo;m feeling dull this week, and I&amp;rsquo;m feeling more tired towards noon….&amp;rdquo;&lt;/em&gt; It&amp;rsquo;s at this point that he takes out his stethoscope, his pulse oximeter, and blood pressure apparatus. Then, if he feels he needs a more in-depth insight, he starts listing out specific tests to be made.&lt;/p&gt;
&lt;p&gt;When I asked him if the first part of the discussion was just an ice-breaker, he said, &amp;ldquo;&lt;em&gt;That&amp;rsquo;s the essential part. It helps me find out how you feel, rather than what your symptoms are.&amp;rdquo;&lt;/em&gt; So, despite appearances, our opening chat about life helped him structure subsequent questions on symptoms, investigations and test results.&lt;/p&gt;
&lt;p&gt;On the way back, I couldn&amp;rsquo;t stop asking myself, &lt;em&gt;&amp;ldquo;Shouldn&amp;rsquo;t we be managing our mesh this way, too?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If I strike parallels between my own health check and a  health check, “tests” would be log analysis, “investigations” would be tracing, and “symptoms” would be the traditional RED (Rate, Errors and Duration) metrics. That leaves the “essential part,” which is what we are talking about here: the &lt;em&gt;Wellness Factor&lt;/em&gt;, primarily the health of our mesh.&lt;/p&gt;
&lt;h3 id=&#34;health-in-the-context-of-service-mesh&#34;&gt;&lt;strong&gt;Health in the context of service mesh&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;We can measure the performance of any observed service through RED metrics.  RED metrics offer immense value in understanding the performance, reliability, and throughput of every service. Compelling visualizations of these metrics across the mesh make monitoring the entire mesh standardized and scalable. Also, setting alerts based on thresholds for each of these metrics helps to detect anomalies as and when they arise.&lt;/p&gt;
&lt;p&gt;To establish the context of any service and observe them, it&amp;rsquo;s ideal to visualize the mesh as a topology.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;mesh-1.png&#34; alt=&#34;alt_text&#34;&gt;&lt;/p&gt;
&lt;p&gt;A topology visualization of the mesh not only allows for picking any service and watching its metrics, but also gives vital information about service dependencies and the potential impact of a given service on the mesh.&lt;/p&gt;
&lt;p&gt;While RED metrics of each service offer tremendous insights, the user is more concerned with the overall responsiveness of the mesh rather than each of these services in isolation.&lt;/p&gt;
&lt;p&gt;To describe the performance of any service, right from submitting the request to receiving a completed http response, we’d be measuring the user&amp;rsquo;s perception of responsiveness. This measure of response time compared with a set threshold is called Apdex. This Apdex is an indicator of the health of a service in the mesh.&lt;/p&gt;
&lt;h3 id=&#34;apdex&#34;&gt;&lt;strong&gt;Apdex&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Apdex is a measure of response time considered against a set threshold**.  **It is the ratio of satisfactory response times and unsatisfactory response times to total response times.&lt;/p&gt;
&lt;p&gt;Apdex is an industry standard to measure the satisfaction of users based on the response time of applications and services. It measures how satisfied your users are with your services, as traditional metrics such as average response time could get skewed quickly.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Satisfactory response time&lt;/em&gt; indicates the number of times when the roundtrip response time of a particular service was less than this threshold. &lt;em&gt;Unsatisfactory response time&lt;/em&gt; while meaning the opposite, is further categorized as &lt;em&gt;Tolerating&lt;/em&gt; and &lt;em&gt;Frustrating&lt;/em&gt;. &lt;em&gt;Tolerating&lt;/em&gt; accommodates any performance that is up to four times the threshold, and anything over that or any errors encountered is considered &lt;em&gt;Frustrating&lt;/em&gt;. The threshold mentioned here is an ideal roundtrip performance that we expect from any service. We could even start with an organization-wide limit of say, 500ms.&lt;/p&gt;
&lt;p&gt;The Apdex score is a ratio of satisfied and tolerating requests to the total requests made.&lt;/p&gt;
&lt;p&gt;Each &lt;em&gt;satisfied request&lt;/em&gt; counts as one request, while each &lt;em&gt;tolerating request&lt;/em&gt; counts as half a  &lt;em&gt;satisfied request&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;An Apdex score takes values from 0 to 1, with 0 being the worst possible score indicating that users were always frustrated, and ‘1’ as the best possible score (100% of response times were Satisfactory).&lt;/p&gt;
&lt;p&gt;A percentage representation of this score also serves as the Health Indicator of the service.&lt;/p&gt;
&lt;h2 id=&#34;the-math&#34;&gt;&lt;strong&gt;The Math&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The actual computation of this Apdex score is achieved through the following formula.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;		            SatisfiedCount +  ( ToleratingCount / 2 )

Apdex Score  =  ------------------------------------------------------

                                TotalSamples
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A percentage representation of this score is known as the Health Indicator of a service.&lt;/p&gt;
&lt;h3 id=&#34;example-computation&#34;&gt;&lt;strong&gt;Example Computation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;During a 2-minute period, a host handles 200 requests.&lt;/p&gt;
&lt;p&gt;The Apdex threshold T = 0.5 seconds (500ms).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;170 of the requests were handled within 500ms, so they are classified as Satisfied.&lt;/li&gt;
&lt;li&gt;20 of the requests were handled between 500ms and 2 seconds (2000 ms), so they are classified as Tolerating.&lt;/li&gt;
&lt;li&gt;The remaining 10 were not handled properly or took longer than 2 seconds, so they are classified as Frustrated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The resulting Apdex score is 0.9:  (170 + (20/2))/200 = 0.9.&lt;/p&gt;
&lt;h3 id=&#34;the-next-level&#34;&gt;&lt;strong&gt;The next level&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;At the next level, we can attempt to improve our topology visualization by coloring nodes based on their health. Also, we can include health as a part of the information we show when the user taps on a service.&lt;/p&gt;
&lt;p&gt;Apdex specifications recommend the following Apdex Quality Ratings by classifying Apdex Score as Excellent (0.94 - 1.00), Good (0.85 - 0.93), Fair (0.70 - 0.84), Poor (0.50 - 0.69)  and Unacceptable (0.00 - 0.49).&lt;/p&gt;
&lt;p&gt;To visualize this, let’s look at our topology using traffic light colors, marking our  nodes as  Healthy,  At-Risk and  Unhealthy, where &lt;strong&gt;Unhealthy&lt;/strong&gt; indicates health that falls below 80%. A rate between 80% and 95% indicates &lt;strong&gt;At-Risk&lt;/strong&gt;, and health at 95% and above is termed &lt;strong&gt;Healthy&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let’s incorporate this coloring into our topology visualization and take its usability to the next level. If implemented, we will be looking at something like this.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;mesh-2.png&#34; alt=&#34;alt_text&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;moving-further&#34;&gt;&lt;strong&gt;Moving further&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Apdex provides tremendous visibility into customer satisfaction on the responsiveness of our services. Even more, by extending the implementation to the edges calling this service we get further insight into the health of the mesh itself.&lt;/p&gt;
&lt;p&gt;Two services with similar Apdex scores offer the same customer satisfaction to the customer. However, the size of traffic that flows into the service can be of immense help in prioritizing between services to address. A service with higher traffic flow is an indication that this experience is impacting a significant number of users on the mesh.&lt;/p&gt;
&lt;p&gt;While health relates to a service, we can also analyze the interactions between two services and calculate the health of the interaction. This health calculation of every interaction on the mesh helps us establish a critical path, based on the health of all interactions in the entire topology.&lt;/p&gt;
&lt;p&gt;In a big mesh, showing traffic as yet another number will make it more challenging to visualize and monitor. We can, with a bit of creativity, improve the entire visualization by rendering the edges that connect services with different thickness depending on the throughput of the service.&lt;/p&gt;
&lt;p&gt;An unhealthy service participating in a high throughput transaction could lead to excessive consumption of resources. On the other hand, this visualization also offers a great tip to maximize investment in tuning services.&lt;/p&gt;
&lt;p&gt;Tuning service that is a part of a high throughput transaction offers exponential benefits when compared to tuning an occasionally used service.&lt;/p&gt;
&lt;p&gt;If we look at implementing such a visualization, which includes the health of interactions and throughput of such interactions, we would be looking at something like below :&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;mesh-4.png&#34; alt=&#34;alt_text&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;the-day-is-not-far&#34;&gt;&lt;strong&gt;The day is not far&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;These capabilities are already available to users &lt;strong&gt;today&lt;/strong&gt; as one of the UI features of Tetrate’s service mesh platform, using the highly configurable and performant observability and performance management framework: Apache SkyWalking (&lt;a href=&#34;https://skywalking.apache.org/&#34;&gt;https://skywalking.apache.org&lt;/a&gt;), which monitors traffic across the mesh, aggregates RED metrics for both services and their interactions, continuously computes and monitors health of the services, and enables users to configure alerts and notifications when services cross specific thresholds, thereby having a comprehensive health visibility of the mesh.&lt;/p&gt;
&lt;p&gt;With such tremendous visibility into our mesh performance, the day is not far when we at our NOC (Network Operations Center) for the mesh have this topology as our HUD (Heads Up Display).&lt;/p&gt;
&lt;p&gt;This HUD, with the insights and patterns gathered over time, would predict situations and proactively prompt us on potential focus areas to improve customer satisfaction.&lt;/p&gt;
&lt;p&gt;The visualization with rich historical data can also empower the Network Engineers to go back in time and look at the performance of the mesh on a similar day in the past.&lt;/p&gt;
&lt;p&gt;An earnest implementation of such a visualization would be something like below :&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;mesh-5.png&#34; alt=&#34;alt_text&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;to-conclude&#34;&gt;&lt;strong&gt;To conclude&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;With all the discussion so far, the health of a mesh is more about how our users feel, and what we can proactively do as service providers to sustain, if not enhance, the experience of our users.&lt;/p&gt;
&lt;p&gt;As the world advances toward personalized medicine, we&amp;rsquo;re not far from a day when my doctor will text me: &amp;ldquo;How about feasting yourself with ice cream today and take the Gray Butte Trail to Mount Shasta!&amp;rdquo; Likewise, we can do more for our customers by having better insight into their overall wellness.&lt;/p&gt;
&lt;p&gt;Tetrate’s approach to “service mesh health” is not only to offer management, monitoring and support but to make infrastructure healthy from the start to reduce the probability of incidents.  Powered by the Istio, Envoy, and SkyWalking, Tetrate&amp;rsquo;s solutions enable consistent end-to-end observability, runtime security, and traffic management for any workload in any environment.&lt;/p&gt;
&lt;p&gt;Our customers deserve healthy systems! Please do share your thoughts on making service mesh an exciting and robust experience for our customers.&lt;/p&gt;
&lt;h3 id=&#34;references&#34;&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Apdex&#34;&gt;https://en.wikipedia.org/wiki/Apdex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.apdex.org/overview.html&#34;&gt;https://www.apdex.org/overview.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.apdex.org/index.php/specifications/&#34;&gt;https://www.apdex.org/index.php/specifications/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://skywalking.apache.org/&#34;&gt;https://skywalking.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Blog: SkyWalking v6 is Service Mesh ready</title>
      <link>/blog/2018-12-12-skywalking-service-mesh-ready/</link>
      <pubDate>Wed, 05 Dec 2018 00:00:00 +0000</pubDate>
      <guid>/blog/2018-12-12-skywalking-service-mesh-ready/</guid>
      <description>
        
        
        &lt;p&gt;Original link, &lt;a href=&#34;https://www.tetrate.io/blog/apache-skywalking-v6/&#34;&gt;Tetrate.io blog&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&#34;context&#34;&gt;Context&lt;/h1&gt;
&lt;p&gt;The integration of SkyWalking and Istio Service Mesh yields an essential open-source tool for resolving the chaos created by the proliferation of siloed, cloud-based services.&lt;/p&gt;
&lt;p&gt;Apache SkyWalking is an open, modern performance management tool for distributed services, designed especially for microservices, cloud native and container-based (Docker, K8s, Mesos) architectures. We at Tetrate believe it is going to be an important project for understanding the performance of microservices. The recently released v6 integrates with Istio Service Mesh and focuses on metrics and tracing. It natively understands the most common language runtimes (Java, .Net, and NodeJS). With its new core code, SkyWalking v6 also supports Istrio telemetry data formats, providing consistent analysis, persistence, and visualization.&lt;/p&gt;
&lt;p&gt;SkyWalking has evolved into an Observability Analysis Platform that enables observation and monitoring of hundreds of services all at once. It promises solutions for some of the trickiest problems faced by system administrators using complex arrays of abundant services: Identifying why and where a request is slow, distinguishing normal from deviant system performance, comparing apples-to-apples metrics across apps regardless of programming language, and attaining a complete and meaningful view of performance.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2ctge1g5j31pc0s8h04.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;h1 id=&#34;skywalking-history&#34;&gt;SkyWalking History&lt;/h1&gt;
&lt;p&gt;Launched in China by Wu Sheng in 2015, SkyWalking started as just a distributed tracing system, like Zipkin, but with auto instrumentation from a Java agent. This enabled JVM users to see distributed traces without any change to their source code. In the last two years, it has been used for research and production by more than &lt;a href=&#34;https://github.com/apache/incubator-skywalking/blob/master/docs/powered-by.md&#34;&gt;50 companies&lt;/a&gt;. With its expanded capabilities, we expect to see it adopted more globally.&lt;/p&gt;
&lt;h1 id=&#34;whats-new&#34;&gt;What&amp;rsquo;s new&lt;/h1&gt;
&lt;h2 id=&#34;service-mesh-integration&#34;&gt;Service Mesh Integration&lt;/h2&gt;
&lt;p&gt;Istio has picked up a lot of steam as the framework of choice for distributed services. Based on all the interest in the Istio project, and community feedback, some SkyWalking (P)PMC members decided to integrate with Istio Service Mesh to move SkyWalking to a higher level.&lt;/p&gt;
&lt;p&gt;So now you can use Skywalking to get metrics and understand the topology of your applications. This works not just for Java, .NET and Node using our language agents, but also for microservices running under the Istio service mesh. You can get a full topology of both kinds of applications.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2cjmhi3uj31h80m5jwn.jpg&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;observability-analysis-platform&#34;&gt;Observability analysis platform&lt;/h2&gt;
&lt;p&gt;With its roots in tracing, SkyWalking is now transitioning into an open-standards based &lt;strong&gt;Observability Analysis Platform&lt;/strong&gt;, which means the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It can accept different kinds and formats of telemetry data from mesh like Istio telemetry.&lt;/li&gt;
&lt;li&gt;Its agents support various popular software technologies and frameworks like Tomcat, Spring, Kafka. The whole supported framework list is &lt;a href=&#34;https://github.com/apache/incubator-skywalking/blob/master/docs/en/setup/service-agent/java-agent/Supported-list.md&#34;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;It can accept data from other compliant sources like Zipkin-formatted traces reported from Zipkin, Jaeger, or OpenCensus clients.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2cqo4yctj31ok0s07hh.jpg&#34; alt=&#34;img&#34;&gt;&lt;/p&gt;
&lt;p&gt;SkyWalking is logically split into four parts: Probes, Platform Backend, Storage and UI:&lt;/p&gt;
&lt;p&gt;There are two kinds of &lt;strong&gt;probes&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Language agents or SDKs following SkyWalking across-thread propagation formats and trace formats, run in the user’s application process.&lt;/li&gt;
&lt;li&gt;The Istio mixer adaptor, which collects telemetry from the Service Mesh.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The platform &lt;strong&gt;backend&lt;/strong&gt; provides gRPC and RESTful HTTP endpoints for all SkyWalking-supported trace and metric telemetry data. For example, you can stream these metrics into an analysis system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Storage&lt;/strong&gt; supports multiple implementations such as ElasticSearch, H2 (alpha), MySQL, and Apache ShardingSphere for MySQL Cluster. TiDB will be supported in next release.&lt;/p&gt;
&lt;p&gt;SkyWalking’s built-in &lt;strong&gt;UI&lt;/strong&gt; with a GraphQL endpoint for data allows intuitive, customizable integration.&lt;/p&gt;
&lt;p&gt;Some examples of SkyWalking’s UI:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Observe a Spring app using the SkyWalking JVM-agent&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2ckeyyxlj31h70lvdjf.jpg&#34; alt=&#34;Topology&#34;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Observe on Istio without any agent, no matter what langugage the service is written in&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2ckwr65mj31h80m5jwn.jpg&#34; alt=&#34;Topology&#34;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;See fine-grained metrics like request/Call per Minute, P99/95/90/75/50 latency, avg response time, heatmap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2cmxcrdqj31gz0qmdja.jpg&#34; alt=&#34;Dashboard&#34;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Service dependencies and metrics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;0081Kckwly1gl2cngbu84j31h00oxadw.jpg&#34; alt=&#34;Service&#34;&gt;&lt;/p&gt;
&lt;h1 id=&#34;service-focused&#34;&gt;Service Focused&lt;/h1&gt;
&lt;p&gt;At Tetrate, we are focused on discovery, reliability, and security of your running services.
This is why we are embracing Skywalking, which makes service performance observable.&lt;/p&gt;
&lt;p&gt;Behind this admittedly cool UI, the aggregation logic is very easy to understand, making it easy to customize SkyWalking in its Observability Analysis Language (OAL) script.&lt;/p&gt;
&lt;p&gt;We’ll post more about OAL for developers looking to customize SkyWalking, and you can read the official &lt;a href=&#34;https://github.com/apache/incubator-skywalking/blob/master/docs/en/concepts-and-designs/oal.md&#34;&gt;OAL introduction&lt;/a&gt; document.&lt;/p&gt;
&lt;p&gt;Scripts are based on three core concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Service&lt;/strong&gt; represents a group of workloads that provide the same behaviours for incoming requests. You can define the service name whether you are using instrument agents or SDKs. Otherwise, SkyWalking uses the name you defined in the underlying platform, such as Istio.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Service Instance&lt;/strong&gt; Each workload in the Service group is called an instance. Like &lt;em&gt;Pods&lt;/em&gt; in Kubernetes, it doesn&amp;rsquo;t need  to be a single OS process. If you are using an instrument agent, an instance does map to one OS process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Endpoint&lt;/strong&gt; is a path in a certain service that handles incoming requests, such as HTTP paths or a gRPC service + method. Mesh telemetry and trace data are formatted as source objects (aka scope). These are the input for the aggregation, with the script describing how to aggregate, including input, conditions, and the resulting metric name.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;core-features&#34;&gt;Core Features&lt;/h1&gt;
&lt;p&gt;The other core features in SkyWalking v6 are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Service, service instance, endpoint metrics analysis.&lt;/li&gt;
&lt;li&gt;Consistent visualization in Service Mesh and no mesh.&lt;/li&gt;
&lt;li&gt;Topology discovery, Service dependency analysis.&lt;/li&gt;
&lt;li&gt;Distributed tracing.&lt;/li&gt;
&lt;li&gt;Slow services and endpoints detected.&lt;/li&gt;
&lt;li&gt;Alarms.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, SkyWalking has some more upgrades from v5, such as:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;ElasticSearch 6 as storage is supported.&lt;/li&gt;
&lt;li&gt;H2 storage implementor is back.&lt;/li&gt;
&lt;li&gt;Kubernetes cluster management is provided. You don’t need Zookeeper to keep the backend running in cluster mode.&lt;/li&gt;
&lt;li&gt;Totally new alarm core. Easier configuration.&lt;/li&gt;
&lt;li&gt;More cloud native style.&lt;/li&gt;
&lt;li&gt;MySQL will be supported in the next release.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1 id=&#34;please-test-and-provide-feedback&#34;&gt;Please: Test and Provide Feedback!&lt;/h1&gt;
&lt;p&gt;We would love everyone to try to test our new version. You can find everything you need in our &lt;a href=&#34;https://github.com/apache/incubator-skywalking&#34;&gt;Apache repository&lt;/a&gt;,read the &lt;a href=&#34;https://github.com/apache/incubator-skywalking/blob/master/docs/README.md&#34;&gt;document&lt;/a&gt; for further details. You can contact the project team through the following channels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Submit an issue on &lt;a href=&#34;https://github.com/apache/incubator-skywalking/issues/new&#34;&gt;GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mailing list: &lt;a href=&#34;mailto:dev@skywalking.apache.org&#34;&gt;dev@skywalking.apache.org&lt;/a&gt; . Send to &lt;a href=&#34;mailto:dev-subscribe@kywalking.apache.org&#34;&gt;dev-subscribe@kywalking.apache.org&lt;/a&gt; to subscribe the mail list.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gitter.im/OpenSkywalking/Lobby&#34;&gt;Gitter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://twitter.com/ASFSkyWalking&#34;&gt;Project twitter&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Oh, and one last thing! If you like our project, don&amp;rsquo;t forget to &lt;a href=&#34;https://github.com/apache/incubator-skywalking&#34;&gt;give us a star on GitHub&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
