You have a trained model. Maybe it classifies images — "is this a cat or a dog?" Maybe it detects objects — "there's a person at coordinates (x, y, w, h)." Maybe it does text classification, pose estimation, or style transfer. The model works on your laptop. Now it needs to run on a phone, offline, at 30fps.
TensorFlow Lite is Google's runtime for on-device ML inference. It takes a trained model (.tflite file), loads it into memory, and runs inference on the device's CPU, GPU, or neural accelerator. The C API gives you direct control over the inference pipeline — no Java/Kotlin/Swift intermediaries.
Naming note: Google rebranded TensorFlow Lite to LiteRT (short for Lite Runtime) in late 2024. The runtime, C API, and .tflite file format are unchanged — only the branding moved. You'll see both names in documentation and packages for a while. Everything in this post applies to both.
Why FFI for TFLite?
The tflite_flutter package exists and wraps TFLite via FFI internally. For many projects, it's the right starting point. But if you need:
- Custom delegate configuration (GPU, NNAPI, Core ML)
- Zero-copy input from camera frames
- Control over tensor memory allocation
- Integration with other native processing (OpenCV + TFLite pipeline)
...then understanding the underlying FFI integration matters.
Getting the TFLite runtime
Android
TFLite ships as a prebuilt AAR. Add it to your Android build:
// android/app/build.gradle
dependencies {
// Check Maven Central for the current version — the runtime moves
// forward with regular releases.
implementation 'org.tensorflow:tensorflow-lite:2.14.0'
}But for FFI, you need the C library (libtensorflowlite_c.so). The easiest source is the tflite_flutter package — it bundles the native libraries. Or download from TensorFlow's releases:
android/app/src/main/jniLibs/
├── arm64-v8a/
│ └── libtensorflowlite_c.so
└── x86_64/
└── libtensorflowlite_c.soiOS
Download the TensorFlowLiteC framework from CocoaPods or build from source:
# ios/Podfile
# Pin to a current version from CocoaPods trunk.
pod 'TensorFlowLiteC', '~> 2.14.0'Or use the tflite_flutter package which handles this.
The C API surface
TFLite's C API follows a clear lifecycle:
TfLiteModel (load .tflite file)
→ TfLiteInterpreter (create interpreter from model)
→ TfLiteInterpreterAllocateTensors (allocate input/output memory)
→ Copy data into input tensor
→ TfLiteInterpreterInvoke (run inference)
→ Read data from output tensor
→ TfLiteInterpreterDelete (cleanup)
→ TfLiteModelDelete (cleanup)Dart FFI bindings
import 'dart:ffi';
import 'dart:io';
import 'package:ffi/ffi.dart';
final DynamicLibrary _tflite = Platform.isAndroid
? DynamicLibrary.open('libtensorflowlite_c.so')
: DynamicLibrary.process();
// Create model from file
typedef _TfLiteModelCreateFromFileC = Pointer<Void> Function(Pointer<Utf8>);
typedef _TfLiteModelCreateFromFileDart = Pointer<Void> Function(Pointer<Utf8>);
final tfLiteModelCreateFromFile = _tflite.lookupFunction<
_TfLiteModelCreateFromFileC,
_TfLiteModelCreateFromFileDart>('TfLiteModelCreateFromFile');
// Create interpreter options
typedef _CreateOptionsC = Pointer<Void> Function();
typedef _CreateOptionsDart = Pointer<Void> Function();
final tfLiteInterpreterOptionsCreate = _tflite.lookupFunction<
_CreateOptionsC, _CreateOptionsDart>('TfLiteInterpreterOptionsCreate');
// Set number of threads
typedef _SetNumThreadsC = Void Function(Pointer<Void>, Int32);
typedef _SetNumThreadsDart = void Function(Pointer<Void>, int);
final tfLiteInterpreterOptionsSetNumThreads = _tflite.lookupFunction<
_SetNumThreadsC,
_SetNumThreadsDart>('TfLiteInterpreterOptionsSetNumThreads');
// Create interpreter
typedef _CreateInterpreterC = Pointer<Void> Function(
Pointer<Void>, Pointer<Void>);
typedef _CreateInterpreterDart = Pointer<Void> Function(
Pointer<Void>, Pointer<Void>);
final tfLiteInterpreterCreate = _tflite.lookupFunction<
_CreateInterpreterC,
_CreateInterpreterDart>('TfLiteInterpreterCreate');
// Allocate tensors
typedef _AllocateTensorsC = Int32 Function(Pointer<Void>);
typedef _AllocateTensorsDart = int Function(Pointer<Void>);
final tfLiteInterpreterAllocateTensors = _tflite.lookupFunction<
_AllocateTensorsC,
_AllocateTensorsDart>('TfLiteInterpreterAllocateTensors');
// Invoke
final tfLiteInterpreterInvoke = _tflite.lookupFunction<
_AllocateTensorsC,
_AllocateTensorsDart>('TfLiteInterpreterInvoke');
// Get input/output tensor count and tensors
typedef _GetTensorCountC = Int32 Function(Pointer<Void>);
typedef _GetTensorCountDart = int Function(Pointer<Void>);
final tfLiteInterpreterGetInputTensorCount = _tflite.lookupFunction<
_GetTensorCountC,
_GetTensorCountDart>('TfLiteInterpreterGetInputTensorCount');
typedef _GetTensorC = Pointer<Void> Function(Pointer<Void>, Int32);
typedef _GetTensorDart = Pointer<Void> Function(Pointer<Void>, int);
final tfLiteInterpreterGetInputTensor = _tflite.lookupFunction<
_GetTensorC, _GetTensorDart>('TfLiteInterpreterGetInputTensor');
final tfLiteInterpreterGetOutputTensor = _tflite.lookupFunction<
_GetTensorC, _GetTensorDart>('TfLiteInterpreterGetOutputTensor');
// Copy data to/from tensors
typedef _CopyFromBufferC = Int32 Function(
Pointer<Void>, Pointer<Void>, IntPtr);
typedef _CopyFromBufferDart = int Function(
Pointer<Void>, Pointer<Void>, int);
final tfLiteTensorCopyFromBuffer = _tflite.lookupFunction<
_CopyFromBufferC,
_CopyFromBufferDart>('TfLiteTensorCopyFromBuffer');
final tfLiteTensorCopyToBuffer = _tflite.lookupFunction<
_CopyFromBufferC,
_CopyFromBufferDart>('TfLiteTensorCopyToBuffer');
// Cleanup
typedef _DeleteC = Void Function(Pointer<Void>);
typedef _DeleteDart = void Function(Pointer<Void>);
final tfLiteInterpreterDelete = _tflite.lookupFunction<
_DeleteC, _DeleteDart>('TfLiteInterpreterDelete');
final tfLiteModelDelete = _tflite.lookupFunction<
_DeleteC, _DeleteDart>('TfLiteModelDelete');
final tfLiteInterpreterOptionsDelete = _tflite.lookupFunction<
_DeleteC, _DeleteDart>('TfLiteInterpreterOptionsDelete');A clean wrapper
class TFLiteModel {
late final Pointer<Void> _model;
late final Pointer<Void> _interpreter;
late final Pointer<Void> _options;
TFLiteModel._(this._model, this._interpreter, this._options);
/// Load a .tflite model from a file path.
static TFLiteModel load(String modelPath, {int numThreads = 2}) {
final pathPtr = modelPath.toNativeUtf8();
final model = tfLiteModelCreateFromFile(pathPtr);
calloc.free(pathPtr);
if (model == nullptr) {
throw Exception('Failed to load TFLite model from: $modelPath');
}
final options = tfLiteInterpreterOptionsCreate();
tfLiteInterpreterOptionsSetNumThreads(options, numThreads);
final interpreter = tfLiteInterpreterCreate(model, options);
if (interpreter == nullptr) {
tfLiteModelDelete(model);
tfLiteInterpreterOptionsDelete(options);
throw Exception('Failed to create TFLite interpreter');
}
final status = tfLiteInterpreterAllocateTensors(interpreter);
if (status != 0) {
tfLiteInterpreterDelete(interpreter);
tfLiteModelDelete(model);
tfLiteInterpreterOptionsDelete(options);
throw Exception('Failed to allocate tensors (status: $status)');
}
return TFLiteModel._(model, interpreter, options);
}
/// Run inference. Input is a Float32List matching the model's input shape.
/// Returns the output as a Float32List.
Float32List run(Float32List input) {
final inputTensor = tfLiteInterpreterGetInputTensor(_interpreter, 0);
final outputTensor = tfLiteInterpreterGetOutputTensor(_interpreter, 0);
// Copy input data to the tensor
final inputPtr = calloc<Float>(input.length);
try {
inputPtr.asTypedList(input.length).setAll(0, input);
tfLiteTensorCopyFromBuffer(
inputTensor, inputPtr.cast(), input.length * 4, // 4 bytes per float
);
// Run inference
final status = tfLiteInterpreterInvoke(_interpreter);
if (status != 0) {
throw Exception('Inference failed (status: $status)');
}
// Read output — for simplicity, assuming a fixed output size
// In production, read the output tensor's shape dynamically
final outputSize = 1001; // e.g., ImageNet classes
final outputPtr = calloc<Float>(outputSize);
try {
tfLiteTensorCopyToBuffer(
outputTensor, outputPtr.cast(), outputSize * 4,
);
return Float32List.fromList(outputPtr.asTypedList(outputSize));
} finally {
calloc.free(outputPtr);
}
} finally {
calloc.free(inputPtr);
}
}
void dispose() {
tfLiteInterpreterDelete(_interpreter);
tfLiteModelDelete(_model);
tfLiteInterpreterOptionsDelete(_options);
}
}Image classification example
class ImageClassifier {
final TFLiteModel _model;
final List<String> _labels;
final int _inputSize; // e.g., 224 for MobileNet
ImageClassifier._(this._model, this._labels, this._inputSize);
static Future<ImageClassifier> load() async {
// Copy model from assets to temp dir
final modelFile = await _copyAssetToFile('assets/mobilenet_v2.tflite');
final labelsText = await rootBundle.loadString('assets/labels.txt');
final labels = labelsText.split('\n').where((l) => l.isNotEmpty).toList();
final model = TFLiteModel.load(modelFile.path, numThreads: 4);
return ImageClassifier._(model, labels, 224);
}
/// Classify an image. Returns top-N results with label and confidence.
List<ClassificationResult> classify(Uint8List rgbaBytes, int width, int height) {
// Preprocess: resize to model input size, normalize to [0, 1]
final input = _preprocess(rgbaBytes, width, height);
// Run inference
final output = _model.run(input);
// Find top 5 results
final indexed = output.asMap().entries.toList()
..sort((a, b) => b.value.compareTo(a.value));
return indexed.take(5).map((e) => ClassificationResult(
label: e.key < _labels.length ? _labels[e.key] : 'Unknown',
confidence: e.value,
)).toList();
}
Float32List _preprocess(Uint8List rgba, int width, int height) {
// Simple nearest-neighbor resize + normalize
final input = Float32List(_inputSize * _inputSize * 3); // RGB, no alpha
final scaleX = width / _inputSize;
final scaleY = height / _inputSize;
for (int y = 0; y < _inputSize; y++) {
for (int x = 0; x < _inputSize; x++) {
final srcX = (x * scaleX).floor();
final srcY = (y * scaleY).floor();
final srcIdx = (srcY * width + srcX) * 4; // RGBA
final dstIdx = (y * _inputSize + x) * 3; // RGB
input[dstIdx + 0] = rgba[srcIdx + 0] / 255.0; // R
input[dstIdx + 1] = rgba[srcIdx + 1] / 255.0; // G
input[dstIdx + 2] = rgba[srcIdx + 2] / 255.0; // B
}
}
return input;
}
void dispose() => _model.dispose();
}
class ClassificationResult {
final String label;
final double confidence;
ClassificationResult({required this.label, required this.confidence});
}Common errors
"Failed to load model" — model file not found
Cause: Flutter asset paths aren't filesystem paths. TFLite needs a real file path.
Fix: Copy the .tflite file from assets to the temporary directory:
static Future<File> _copyAssetToFile(String assetPath) async {
final data = await rootBundle.load(assetPath);
final dir = await getTemporaryDirectory();
final file = File('${dir.path}/${assetPath.split('/').last}');
await file.writeAsBytes(data.buffer.asUint8List());
return file;
}Inference returns garbage — wrong input preprocessing
Cause: The model expects inputs normalized to a specific range (e.g., [-1, 1] or [0, 1]), in a specific channel order (RGB vs BGR), at a specific size. If your preprocessing doesn't match what the model was trained with, the output is meaningless.
Fix: Check the model's documentation for the expected input format. MobileNet V2 expects 224x224 RGB normalized to [0, 1]. EfficientNet expects [-1, 1]. SSD MobileNet expects [0, 255] (uint8). The preprocessing must match exactly.
App crashes on inference — input tensor size mismatch
Cause: The input data size doesn't match the tensor's expected size. If the model expects 224x224x3 floats (150,528 floats = 602,112 bytes) and you pass a different size, TFLite may crash or return an error.
Fix: Read the input tensor's shape and data type before copying data. The number of bytes must match exactly.
Inference is too slow (100ms+ per frame)
Cause: Running on CPU with a single thread, or using a model that's too large for mobile.
Fix:
- Increase threads:
tfLiteInterpreterOptionsSetNumThreads(options, 4) - Use GPU delegate (Android:
TfLiteGpuDelegateV2Create, iOS: Metal delegate) - Use a mobile-optimized model (MobileNet, EfficientNet-Lite, not ResNet-50)
- Quantize the model (int8 instead of float32 — 2-4x faster on most mobile CPUs)
- Run on a background isolate to avoid UI jank
Model works on Android but crashes on iOS (or vice versa)
Cause: Different TFLite versions, or the model uses an op that's not supported on one platform. TFLite's op coverage varies slightly between platforms and versions.
Fix: Use the same TFLite version on both platforms. Check the model's ops against TFLite's compatibility list. If an op is missing, use Flex delegate (includes select TensorFlow ops at the cost of larger binary) or convert the model to use only built-in ops.
Memory grows over time during repeated inference
Cause: You're allocating native buffers (calloc) for each inference call and not freeing them, or you're creating new interpreters without disposing old ones.
Fix: Allocate input/output buffers once and reuse them. Create the interpreter once, run inference many times, dispose once. The try/finally pattern from the FFI foundations series applies here.
This is Post 15 of the FFI series. Next: PDF Rendering With PDFium.*