Flutter TensorFlow Lite FFI — Object Detection and Classification on Device

You have a trained model. Maybe it classifies images — "is this a cat or a dog?" Maybe it detects objects — "there's a person at coordinates (x, y, w, h)." Maybe it does text classification, pose estimation, or style transfer. The model works on your laptop. Now it needs to run on a phone, offline, at 30fps.

TensorFlow Lite is Google's runtime for on-device ML inference. It takes a trained model (.tflite file), loads it into memory, and runs inference on the device's CPU, GPU, or neural accelerator. The C API gives you direct control over the inference pipeline — no Java/Kotlin/Swift intermediaries.

Naming note: Google rebranded TensorFlow Lite to LiteRT (short for Lite Runtime) in late 2024. The runtime, C API, and .tflite file format are unchanged — only the branding moved. You'll see both names in documentation and packages for a while. Everything in this post applies to both.

Why FFI for TFLite?

The tflite_flutter package exists and wraps TFLite via FFI internally. For many projects, it's the right starting point. But if you need:

Custom delegate configuration (GPU, NNAPI, Core ML)
Zero-copy input from camera frames
Control over tensor memory allocation
Integration with other native processing (OpenCV + TFLite pipeline)

...then understanding the underlying FFI integration matters.

Getting the TFLite runtime

Android

TFLite ships as a prebuilt AAR. Add it to your Android build:

groovy

// android/app/build.gradle
dependencies {
    // Check Maven Central for the current version — the runtime moves
    // forward with regular releases.
    implementation 'org.tensorflow:tensorflow-lite:2.14.0'
}

But for FFI, you need the C library (libtensorflowlite_c.so). The easiest source is the tflite_flutter package — it bundles the native libraries. Or download from TensorFlow's releases:

javascript

android/app/src/main/jniLibs/
├── arm64-v8a/
│   └── libtensorflowlite_c.so
└── x86_64/
    └── libtensorflowlite_c.so

iOS

Download the TensorFlowLiteC framework from CocoaPods or build from source:

ruby

# ios/Podfile
# Pin to a current version from CocoaPods trunk.
pod 'TensorFlowLiteC', '~> 2.14.0'

Or use the tflite_flutter package which handles this.

The C API surface

TFLite's C API follows a clear lifecycle:

javascript

TfLiteModel (load .tflite file)
  → TfLiteInterpreter (create interpreter from model)
    → TfLiteInterpreterAllocateTensors (allocate input/output memory)
      → Copy data into input tensor
      → TfLiteInterpreterInvoke (run inference)
      → Read data from output tensor
    → TfLiteInterpreterDelete (cleanup)
  → TfLiteModelDelete (cleanup)

Dart FFI bindings

dart

import 'dart:ffi';
import 'dart:io';
import 'package:ffi/ffi.dart';

final DynamicLibrary _tflite = Platform.isAndroid
    ? DynamicLibrary.open('libtensorflowlite_c.so')
    : DynamicLibrary.process();

// Create model from file
typedef _TfLiteModelCreateFromFileC = Pointer<Void> Function(Pointer<Utf8>);
typedef _TfLiteModelCreateFromFileDart = Pointer<Void> Function(Pointer<Utf8>);
final tfLiteModelCreateFromFile = _tflite.lookupFunction<
    _TfLiteModelCreateFromFileC,
    _TfLiteModelCreateFromFileDart>('TfLiteModelCreateFromFile');

// Create interpreter options
typedef _CreateOptionsC = Pointer<Void> Function();
typedef _CreateOptionsDart = Pointer<Void> Function();
final tfLiteInterpreterOptionsCreate = _tflite.lookupFunction<
    _CreateOptionsC, _CreateOptionsDart>('TfLiteInterpreterOptionsCreate');

// Set number of threads
typedef _SetNumThreadsC = Void Function(Pointer<Void>, Int32);
typedef _SetNumThreadsDart = void Function(Pointer<Void>, int);
final tfLiteInterpreterOptionsSetNumThreads = _tflite.lookupFunction<
    _SetNumThreadsC,
    _SetNumThreadsDart>('TfLiteInterpreterOptionsSetNumThreads');

// Create interpreter
typedef _CreateInterpreterC = Pointer<Void> Function(
    Pointer<Void>, Pointer<Void>);
typedef _CreateInterpreterDart = Pointer<Void> Function(
    Pointer<Void>, Pointer<Void>);
final tfLiteInterpreterCreate = _tflite.lookupFunction<
    _CreateInterpreterC,
    _CreateInterpreterDart>('TfLiteInterpreterCreate');

// Allocate tensors
typedef _AllocateTensorsC = Int32 Function(Pointer<Void>);
typedef _AllocateTensorsDart = int Function(Pointer<Void>);
final tfLiteInterpreterAllocateTensors = _tflite.lookupFunction<
    _AllocateTensorsC,
    _AllocateTensorsDart>('TfLiteInterpreterAllocateTensors');

// Invoke
final tfLiteInterpreterInvoke = _tflite.lookupFunction<
    _AllocateTensorsC,
    _AllocateTensorsDart>('TfLiteInterpreterInvoke');

// Get input/output tensor count and tensors
typedef _GetTensorCountC = Int32 Function(Pointer<Void>);
typedef _GetTensorCountDart = int Function(Pointer<Void>);
final tfLiteInterpreterGetInputTensorCount = _tflite.lookupFunction<
    _GetTensorCountC,
    _GetTensorCountDart>('TfLiteInterpreterGetInputTensorCount');

typedef _GetTensorC = Pointer<Void> Function(Pointer<Void>, Int32);
typedef _GetTensorDart = Pointer<Void> Function(Pointer<Void>, int);
final tfLiteInterpreterGetInputTensor = _tflite.lookupFunction<
    _GetTensorC, _GetTensorDart>('TfLiteInterpreterGetInputTensor');
final tfLiteInterpreterGetOutputTensor = _tflite.lookupFunction<
    _GetTensorC, _GetTensorDart>('TfLiteInterpreterGetOutputTensor');

// Copy data to/from tensors
typedef _CopyFromBufferC = Int32 Function(
    Pointer<Void>, Pointer<Void>, IntPtr);
typedef _CopyFromBufferDart = int Function(
    Pointer<Void>, Pointer<Void>, int);
final tfLiteTensorCopyFromBuffer = _tflite.lookupFunction<
    _CopyFromBufferC,
    _CopyFromBufferDart>('TfLiteTensorCopyFromBuffer');
final tfLiteTensorCopyToBuffer = _tflite.lookupFunction<
    _CopyFromBufferC,
    _CopyFromBufferDart>('TfLiteTensorCopyToBuffer');

// Cleanup
typedef _DeleteC = Void Function(Pointer<Void>);
typedef _DeleteDart = void Function(Pointer<Void>);
final tfLiteInterpreterDelete = _tflite.lookupFunction<
    _DeleteC, _DeleteDart>('TfLiteInterpreterDelete');
final tfLiteModelDelete = _tflite.lookupFunction<
    _DeleteC, _DeleteDart>('TfLiteModelDelete');
final tfLiteInterpreterOptionsDelete = _tflite.lookupFunction<
    _DeleteC, _DeleteDart>('TfLiteInterpreterOptionsDelete');

A clean wrapper

dart

class TFLiteModel {
  late final Pointer<Void> _model;
  late final Pointer<Void> _interpreter;
  late final Pointer<Void> _options;

  TFLiteModel._(this._model, this._interpreter, this._options);

  /// Load a .tflite model from a file path.
  static TFLiteModel load(String modelPath, {int numThreads = 2}) {
    final pathPtr = modelPath.toNativeUtf8();
    final model = tfLiteModelCreateFromFile(pathPtr);
    calloc.free(pathPtr);

    if (model == nullptr) {
      throw Exception('Failed to load TFLite model from: $modelPath');
    }

    final options = tfLiteInterpreterOptionsCreate();
    tfLiteInterpreterOptionsSetNumThreads(options, numThreads);

    final interpreter = tfLiteInterpreterCreate(model, options);
    if (interpreter == nullptr) {
      tfLiteModelDelete(model);
      tfLiteInterpreterOptionsDelete(options);
      throw Exception('Failed to create TFLite interpreter');
    }

    final status = tfLiteInterpreterAllocateTensors(interpreter);
    if (status != 0) {
      tfLiteInterpreterDelete(interpreter);
      tfLiteModelDelete(model);
      tfLiteInterpreterOptionsDelete(options);
      throw Exception('Failed to allocate tensors (status: $status)');
    }

    return TFLiteModel._(model, interpreter, options);
  }

  /// Run inference. Input is a Float32List matching the model's input shape.
  /// Returns the output as a Float32List.
  Float32List run(Float32List input) {
    final inputTensor = tfLiteInterpreterGetInputTensor(_interpreter, 0);
    final outputTensor = tfLiteInterpreterGetOutputTensor(_interpreter, 0);

    // Copy input data to the tensor
    final inputPtr = calloc<Float>(input.length);
    try {
      inputPtr.asTypedList(input.length).setAll(0, input);
      tfLiteTensorCopyFromBuffer(
        inputTensor, inputPtr.cast(), input.length * 4, // 4 bytes per float
      );

      // Run inference
      final status = tfLiteInterpreterInvoke(_interpreter);
      if (status != 0) {
        throw Exception('Inference failed (status: $status)');
      }

      // Read output — for simplicity, assuming a fixed output size
      // In production, read the output tensor's shape dynamically
      final outputSize = 1001; // e.g., ImageNet classes
      final outputPtr = calloc<Float>(outputSize);
      try {
        tfLiteTensorCopyToBuffer(
          outputTensor, outputPtr.cast(), outputSize * 4,
        );
        return Float32List.fromList(outputPtr.asTypedList(outputSize));
      } finally {
        calloc.free(outputPtr);
      }
    } finally {
      calloc.free(inputPtr);
    }
  }

  void dispose() {
    tfLiteInterpreterDelete(_interpreter);
    tfLiteModelDelete(_model);
    tfLiteInterpreterOptionsDelete(_options);
  }
}

Image classification example

dart

class ImageClassifier {
  final TFLiteModel _model;
  final List<String> _labels;
  final int _inputSize; // e.g., 224 for MobileNet

  ImageClassifier._(this._model, this._labels, this._inputSize);

  static Future<ImageClassifier> load() async {
    // Copy model from assets to temp dir
    final modelFile = await _copyAssetToFile('assets/mobilenet_v2.tflite');
    final labelsText = await rootBundle.loadString('assets/labels.txt');
    final labels = labelsText.split('\n').where((l) => l.isNotEmpty).toList();

    final model = TFLiteModel.load(modelFile.path, numThreads: 4);
    return ImageClassifier._(model, labels, 224);
  }

  /// Classify an image. Returns top-N results with label and confidence.
  List<ClassificationResult> classify(Uint8List rgbaBytes, int width, int height) {
    // Preprocess: resize to model input size, normalize to [0, 1]
    final input = _preprocess(rgbaBytes, width, height);

    // Run inference
    final output = _model.run(input);

    // Find top 5 results
    final indexed = output.asMap().entries.toList()
      ..sort((a, b) => b.value.compareTo(a.value));

    return indexed.take(5).map((e) => ClassificationResult(
      label: e.key < _labels.length ? _labels[e.key] : 'Unknown',
      confidence: e.value,
    )).toList();
  }

  Float32List _preprocess(Uint8List rgba, int width, int height) {
    // Simple nearest-neighbor resize + normalize
    final input = Float32List(_inputSize * _inputSize * 3); // RGB, no alpha
    final scaleX = width / _inputSize;
    final scaleY = height / _inputSize;

    for (int y = 0; y < _inputSize; y++) {
      for (int x = 0; x < _inputSize; x++) {
        final srcX = (x * scaleX).floor();
        final srcY = (y * scaleY).floor();
        final srcIdx = (srcY * width + srcX) * 4; // RGBA

        final dstIdx = (y * _inputSize + x) * 3; // RGB
        input[dstIdx + 0] = rgba[srcIdx + 0] / 255.0; // R
        input[dstIdx + 1] = rgba[srcIdx + 1] / 255.0; // G
        input[dstIdx + 2] = rgba[srcIdx + 2] / 255.0; // B
      }
    }
    return input;
  }

  void dispose() => _model.dispose();
}

class ClassificationResult {
  final String label;
  final double confidence;
  ClassificationResult({required this.label, required this.confidence});
}

Common errors

"Failed to load model" — model file not found

Cause: Flutter asset paths aren't filesystem paths. TFLite needs a real file path.

Fix: Copy the .tflite file from assets to the temporary directory:

dart

static Future<File> _copyAssetToFile(String assetPath) async {
  final data = await rootBundle.load(assetPath);
  final dir = await getTemporaryDirectory();
  final file = File('${dir.path}/${assetPath.split('/').last}');
  await file.writeAsBytes(data.buffer.asUint8List());
  return file;
}

Inference returns garbage — wrong input preprocessing

Cause: The model expects inputs normalized to a specific range (e.g., [-1, 1] or [0, 1]), in a specific channel order (RGB vs BGR), at a specific size. If your preprocessing doesn't match what the model was trained with, the output is meaningless.

Fix: Check the model's documentation for the expected input format. MobileNet V2 expects 224x224 RGB normalized to [0, 1]. EfficientNet expects [-1, 1]. SSD MobileNet expects [0, 255] (uint8). The preprocessing must match exactly.

App crashes on inference — input tensor size mismatch

Cause: The input data size doesn't match the tensor's expected size. If the model expects 224x224x3 floats (150,528 floats = 602,112 bytes) and you pass a different size, TFLite may crash or return an error.

Fix: Read the input tensor's shape and data type before copying data. The number of bytes must match exactly.

Inference is too slow (100ms+ per frame)

Cause: Running on CPU with a single thread, or using a model that's too large for mobile.

Fix:

Increase threads: tfLiteInterpreterOptionsSetNumThreads(options, 4)
Use GPU delegate (Android: TfLiteGpuDelegateV2Create, iOS: Metal delegate)
Use a mobile-optimized model (MobileNet, EfficientNet-Lite, not ResNet-50)
Quantize the model (int8 instead of float32 — 2-4x faster on most mobile CPUs)
Run on a background isolate to avoid UI jank

Model works on Android but crashes on iOS (or vice versa)

Cause: Different TFLite versions, or the model uses an op that's not supported on one platform. TFLite's op coverage varies slightly between platforms and versions.

Fix: Use the same TFLite version on both platforms. Check the model's ops against TFLite's compatibility list. If an op is missing, use Flex delegate (includes select TensorFlow ops at the cost of larger binary) or convert the model to use only built-in ops.

Memory grows over time during repeated inference

Cause: You're allocating native buffers (calloc) for each inference call and not freeing them, or you're creating new interpreters without disposing old ones.

Fix: Allocate input/output buffers once and reuse them. Create the interpreter once, run inference many times, dispose once. The try/finally pattern from the FFI foundations series applies here.

This is Post 15 of the FFI series. Next: PDF Rendering With PDFium.*

On-Device ML With TensorFlow Lite in Flutter

Why FFI for TFLite?

Getting the TFLite runtime

Android

iOS

The C API surface

Dart FFI bindings

A clean wrapper

Image classification example

Common errors

"Failed to load model" — model file not found

Inference returns garbage — wrong input preprocessing

App crashes on inference — input tensor size mismatch

Inference is too slow (100ms+ per frame)

Model works on Android but crashes on iOS (or vice versa)

Memory grows over time during repeated inference

Related Topics

Ready to build your app?