iOS Rendering Pipeline — Core Animation and Metal for Flutter

On Android, the rendering pipeline goes: Impeller → OpenGL ES or Vulkan → GPU driver → SurfaceFlinger → Hardware Composer → display (Android series, Post 8). iOS has an analogous pipeline, but the components are different and, in some ways, more tightly integrated.

The key players on iOS: Metal (Apple's GPU API, equivalent to Vulkan), Core Animation (the system compositor, roughly equivalent to SurfaceFlinger), and CADisplayLink (the VSync mechanism, equivalent to Android's Choreographer). Impeller targets Metal directly, which gives Flutter a clean, modern GPU interface on iOS.

Metal: the GPU API

Metal is Apple's low-level GPU programming interface, introduced in 2014. It replaced OpenGL ES on Apple platforms and provides direct control over GPU operations: command buffer creation, resource management, shader compilation, and synchronisation.

Impeller's Metal backend is Flutter's primary rendering path on iOS. When Impeller renders a frame, it:

Creates a command buffer. A Metal command buffer contains a sequence of GPU operations (draw calls, compute dispatches, resource copies) that will be submitted to the GPU as a batch.

Encodes render passes. Each render pass specifies a target texture (the surface Impeller is drawing to), a set of draw calls, and the associated state (shaders, textures, blend modes). A typical Flutter frame might have one main render pass plus additional passes for effects like blur or shadow.

Submits the command buffer. The command buffer is sent to the GPU via Metal's command queue. The GPU executes the commands asynchronously — Impeller doesn't wait for the GPU to finish. It immediately starts preparing the next frame's commands on the CPU.

Presents the drawable. When the GPU finishes rendering, the completed texture is presented to the display system via MTKView's (or CAMetalLayer's) presentation mechanism.

The Metal backend is the reason Flutter rendering is particularly smooth on iOS. Metal is well-documented, consistently implemented (Apple controls both the API and the GPU hardware), and optimised for Apple Silicon's unified memory architecture.

Unified memory: no copies

Apple Silicon uses unified memory — the CPU and GPU share the same physical memory. This is fundamentally different from discrete GPU architectures (common in desktop PCs) and even from some mobile chips where CPU and GPU have separate memory pools.

On a unified memory system, when Impeller allocates a Metal texture, the texture data exists in physical RAM that both the CPU and GPU can access directly. There's no "upload to GPU memory" step. When Impeller writes vertex data to a buffer and then submits a draw call, the GPU reads the same physical pages the CPU wrote to — no copy.

This has practical performance implications:

Image decoding is fast because the decoded pixel data doesn't need to be copied from CPU-accessible memory to GPU-accessible memory. The decoded data is already where the GPU can read it.
Dynamic content (updating textures per-frame) has minimal overhead because there's no upload latency.
Memory accounting is simpler — there's no separate "GPU memory" category. Metal allocations are the same as regular memory allocations, just tagged for GPU access.

The downside: GPU memory competes with CPU memory for the same physical RAM. On Android, GPU memory is often separate (or partially separate) from CPU memory, so a GPU-heavy app doesn't necessarily pressure the CPU memory budget. On iOS, it's all one pool, so Impeller's textures and render targets directly reduce the memory available for the Dart heap and everything else.

Core Animation: the compositor

Core Animation is iOS's compositing engine. It's the equivalent of Android's SurfaceFlinger — the system component that combines multiple visual layers into the final image displayed on screen.

Every visual element on iOS is backed by a CALayer. Your app's UIView hierarchy corresponds to a CALayer hierarchy. The Flutter engine's FlutterView contains a CAMetalLayer — a CALayer subclass that provides Metal-rendered content.

Core Animation composites layers in a render server process (backboardd), which runs at a higher priority than any app. The composition happens like this:

javascript

Your Flutter app's CAMetalLayer ─┐
Status bar CALayer               ─┼─→ Core Animation render server
Home indicator CALayer            ─┤   (in backboardd process)
Notification CALayers (if any)    ─┘         │
                                              ▼
                                         Composited framebuffer
                                              │
                                              ▼
                                         Display hardware

Core Animation can composite layers using the GPU (GPU composition) or using the display controller hardware directly (direct scanout, equivalent to Android's Hardware Composer). Simple layer stacks — a full-screen app with a status bar — are typically composited in hardware. Complex stacks — overlapping translucent layers, 3D transforms — fall back to GPU composition.

For a typical Flutter app, the composition is simple: one CAMetalLayer covering most of the screen, with system chrome (status bar, home indicator) on top. This is usually handled by direct scanout — no GPU composition overhead.

The frame lifecycle on iOS

Tracing a single frame from Dart to display:

javascript

CADisplayLink fires (VSync signal)
  │
  ▼
[Dart UI thread]
  ├── Animation ticks
  ├── Build phase (widget tree)
  ├── Layout phase (render tree)
  ├── Paint phase (layer tree → display lists)
  │
  ▼ Layer tree handed to raster thread
  │
[Raster thread]
  ├── Impeller receives layer tree
  ├── Creates Metal command buffer
  ├── For each render pass:
  │     ├── Bind Metal render pipeline state (shaders)
  │     ├── Bind textures (images, gradients)
  │     ├── Encode draw calls (vertices, indices)
  │     └── End render pass
  ├── Submit command buffer to Metal command queue
  │     → GPU begins executing asynchronously
  │
  ▼ CPU is free for next frame
  │
[GPU - Apple Silicon]
  ├── Execute vertex shaders (transform geometry)
  ├── Rasterize triangles
  ├── Execute fragment shaders (color, effects)
  ├── Write pixels to CAMetalLayer's drawable texture
  │
  ▼ GPU signals completion
  │
[CAMetalLayer]
  ├── Present drawable to Core Animation
  │
  ▼
[Core Animation render server (backboardd)]
  ├── Composite all visible layers
  ├── Submit to display controller
  │
  ▼
[Display hardware]
  └── Scan out framebuffer → pixels → photons

The total pipeline latency is typically 2-3 frames (33-50ms at 60Hz) from Dart code execution to photons. As on Android, this latency is constant and invisible to the user — what matters is consistency, not absolute latency.

CADisplayLink: the VSync mechanism

CADisplayLink is the iOS equivalent of Android's Choreographer. It's a timer that fires synchronised with the display's refresh rate.

The Flutter engine creates a CADisplayLink and adds it to the main thread's run loop. On each fire, the engine:

Signals the Dart UI thread to begin the next frame
The Dart UI thread runs the frame pipeline (build, layout, paint)
The raster thread renders the frame

On ProMotion displays (iPhone 13 Pro and later), the refresh rate is variable — 1Hz to 120Hz. CADisplayLink adapts automatically:

During animation, it fires at up to 120Hz (8.33ms per frame)
During static content, it fires at lower rates (30Hz, 24Hz, or even 10Hz)
When idle, the engine invalidates the CADisplayLink entirely — no fires, no CPU usage

Flutter's engine manages this adaptation by setting CADisplayLink.preferredFrameRateRange:

swift

displayLink.preferredFrameRateRange = CAFrameRateRange(
    minimum: 80,
    maximum: 120,
    preferred: 120
)

During animation, the engine requests the highest rate. When the animation ends and no more frames are scheduled, the engine invalidates the display link. This is why an idle Flutter app on a ProMotion display uses almost no CPU and the display can drop to a very low refresh rate.

Impeller on Metal: what makes it fast

Impeller's Metal backend takes advantage of several Metal-specific features:

Precompiled shaders. All of Impeller's shaders are compiled to Metal IR (Intermediate Representation) at Flutter build time, then converted to device-specific GPU machine code at app install time (using Metal's offline compilation pipeline). No runtime shader compilation. This eliminates the "shader jank" that plagued Skia's OpenGL backend (Android series, Post 8). On iOS, this was actually less of a problem even with Skia (Metal's shader compilation was faster than OpenGL drivers on Android), but Impeller makes it zero.

Persistent Metal pipeline state objects. Metal requires explicit pipeline state objects (PSOs) that define the complete GPU state for a draw call — shaders, blend mode, pixel format, multisampling. Creating PSOs is expensive (hundreds of microseconds). Impeller creates all its PSOs at startup and reuses them across frames. No per-frame state object creation.

Efficient command encoding. Impeller batches draw calls aggressively, minimising the number of render encoder state changes (texture binds, shader switches) per frame. State changes are the most expensive part of GPU rendering — each one may flush the GPU pipeline. Fewer state changes means more efficient GPU utilisation.

Shared resource heaps. Impeller uses Metal's resource heaps to allocate textures and buffers from a pre-allocated pool, avoiding per-allocation overhead. This is particularly effective for transient resources (render targets for blur passes, intermediate buffers) that are created and destroyed every frame.

Triple buffering on iOS

Like Android, iOS uses multiple buffers to decouple CPU rendering from display refresh:

javascript

Frame N:   [CPU renders] → [GPU executes] → [Display shows]
Frame N+1:                  [CPU renders] → [GPU executes] → [Display shows]
Frame N+2:                                   [CPU renders] → ...

CAMetalLayer provides 2 or 3 drawables (GPU-renderable textures) in rotation. While one drawable is being displayed and another is being composited by Core Animation, the app can render into a third.

The number of drawables is configurable (maximumDrawableCount), but the Flutter engine uses the default (typically 3 for triple buffering). Triple buffering absorbs timing variance — if a frame takes slightly longer than one VSync interval, the pipeline doesn't stall because there's a buffer to absorb the delay.

Where jank comes from on iOS

The sources of jank are similar to Android but with iOS-specific characteristics:

First-frame overhead after resume. When an app resumes from suspension, the first frame may take longer because:

The CADisplayLink needs to be re-established
Metal textures that were purged during suspension need to be recreated
The Dart VM's code pages may need to be re-loaded from disk (they were clean pages that could be evicted)

Thermal throttling. iPhones aggressively throttle CPU and GPU performance under thermal load. A sustained 120Hz animation on a warm iPhone 15 Pro might be throttled to 80Hz or even 60Hz. The Flutter engine doesn't drop frames in this case — the system reduces the VSync rate, and the engine adapts. But transitions between rates can cause a brief visual inconsistency.

Core Animation commit overhead. When the CAMetalLayer presents a drawable, Core Animation packages the layer tree and sends it to the render server process. This "commit" step takes CPU time on the main thread. If the layer tree is complex (many sublayers from plugins — camera preview, maps, platform views), the commit overhead can be significant.

Platform view compositing. When a Flutter app uses platform views (native iOS views embedded in the Flutter hierarchy — Google Maps, WebView, camera preview), Core Animation has to composite these with the Flutter surface. This forces GPU composition and adds latency. It's the most common source of jank in Flutter iOS apps that use platform views.

Debugging the pipeline

Xcode GPU Frame Capture. Capture a single frame and inspect every Metal command: draw calls, shader execution time, texture memory, pipeline state. This shows exactly what Impeller submitted to the GPU and how long each operation took.

Instruments — Metal System Trace. Shows the complete GPU timeline: when command buffers were submitted, when the GPU started and finished them, and how they aligned with VSync. This is the iOS equivalent of Android's Perfetto for GPU analysis.

Instruments — Core Animation. Shows layer compositing: which layers were composited, whether GPU or hardware composition was used, and the frame times from Core Animation's perspective.

Flutter DevTools. Shows the Dart-side frame timing (UI thread and raster thread), which tells you whether the bottleneck is CPU-side (build, layout, paint) or GPU-side (raster thread waiting for GPU completion).

`CAMetalLayer.presentsWithTransaction`. Setting this to true synchronises Metal rendering with Core Animation transactions, which can help diagnose visual artifacts but adds latency. Useful for debugging, not for production.

iOS vs Android rendering

| Aspect | Android | iOS | |--------|---------|-----| | GPU API | Vulkan / OpenGL ES | Metal | | GPU hardware | Varies (Adreno, Mali, PowerVR) | Apple GPU (one vendor) | | Compositor | SurfaceFlinger | Core Animation (backboardd) | | Hardware composition | HWC (Hardware Composer) | Direct scanout | | VSync mechanism | Choreographer | CADisplayLink | | Variable refresh | LTPO (1-120Hz) | ProMotion (1-120Hz) | | GPU memory | Sometimes separate pool | Unified with CPU memory | | Shader compilation | Impeller precompiles | Impeller precompiles |

The biggest practical difference is consistency. iOS has one GPU vendor, one driver, one Metal implementation. Impeller's Metal backend is tested against exactly one target. Android has multiple GPU vendors with different driver behaviours, and Impeller's Vulkan/OpenGL backends must accommodate all of them. This is why Flutter rendering on iOS is generally smoother — not because the hardware is necessarily faster, but because the software stack is more predictable.

The next post covers the sandbox, code signing, and the Secure Enclave — the security mechanisms that protect your app and the user's data.

This is Post 7 of the iOS Under the Surface series. Previous: Mach Messages and XPC. Next: The Sandbox, Code Signing, and the Secure Enclave.

Core Animation and Metal: The Rendering Pipeline

Metal: the GPU API

Unified memory: no copies

Core Animation: the compositor

The frame lifecycle on iOS

CADisplayLink: the VSync mechanism

Impeller on Metal: what makes it fast

Triple buffering on iOS

Where jank comes from on iOS

Debugging the pipeline

iOS vs Android rendering

Related Topics

Ready to build your app?