AI

React Native ExecuTorch v0.8.0 – A Library Milestone

Mateusz KopcińskiApr 3, 20265 min read
If you’ve been following the on-device AI space, you know the trend –  models are getting smaller, hardware is getting faster, and more is possible directly on-device.

With React Native ExecuTorch v0.8.0, we’re pushing that boundary further. This is our biggest release so far, packed with major improvements and new capabilities. Let’s take a closer look at what’s inside! 

Computer Vision Meets the Camera

This release turns your phone’s camera into an AI-powered sensor. Every computer vision hook now exposes a runOnFrame worklet that plugs directly into VisionCamera v5, meaning you can run segmentation, detection, or classification on live camera frames with zero extra plumbing.
const model = useObjectDetection({ model: SSDLITE_320_MOBILENET_V3_LARGE });
const [detections, setDetections] = useState<Detection[]>([]);

const frameOutput = useFrameOutput({
 pixelFormat: 'rgb',
 dropFramesWhileBusy: true,
 onFrame: useCallback(
   (frame: Frame) => {
     'worklet';
     try {
       const isFrontCamera = false; // using back camera
       const result = model.runOnFrame(frame, isFrontCamera, 0.5);
       if (result) {
         scheduleOnRN(updateDetections, result);
       }
     } finally {
       frame.dispose();
     }
   },
   [model, updateDetections]
 ),
});
We’ve added a full suite of models to go with it:
  • Instance segmentation lands with support for YOLO (from nano to extra-large) and RF-DETR, giving you per-pixel object masks in real time.
  • Object detection picks up the same model families: YOLO and RF-DETR.
  • Semantic segmentation now supports DeepLab V3, LRASPP, FCN.
  • A dedicated Selfie Segmentation model. With this you can implement your own background blurring, virtual backgrounds or whatever else you can think of!
  • Finally – for on-device efficiency, we’ve shipped quantized variants of CLIP, Style Transfer, EfficientNetV2, and SSDLite.
However, if those models are not enough for you, we added fromCustomModel, so that you can easily integrate your own models!
const MyLabels = { BACKGROUND: 0, FOREGROUND: 1 } as const;
const segmentation = await SemanticSegmentationModule.fromCustomModel(
 'https://example.com/custom_model.pte',
 {
   labelMap: MyLabels,
   preprocessorConfig: {
     normMean: [0.485, 0.456, 0.406],
     normStd: [0.229, 0.224, 0.225],
   },
 }
);


const result = await segmentation.forward(imageUri);
result.ARGMAX; // Int32Array
Train something specific to your domain, export it to ExecuTorch, and plug it right in.

Vision Language Models on Device

This is one we’re particularly excited about. useLLM now supports multimodal input  –  you can pass images alongside text messages.
const llm = useLLM({
 modelSource: LLM.LFM2_VL_1_6B_QUANTIZED,
});
llm.sendMessage('What do you see in this image?', {
 images: [imageUri],
});

The first supported model is LFM 1.6B VLM, quantized and running entirely on-device. For use cases like accessibility or document understanding, this is a meaningful step  –  you get multimodal reasoning at the edge with a single hook.

Kokoro TTS  –  Streaming and Phoneme Control

Text-to-Speech can now stream directly from a running LLM. As the model generates tokens, you can pipe the expanding text into TTS for real-time speech synthesis  –  no waiting for the full response to finish.
const tts = useTextToSpeech({ model: TTS.KOKORO });
// As LLM generates text incrementally:
await tts.streamInsert('Hello, here is');
await tts.streamInsert('the latest');
await tts.streamStop();

Kokoro TTS also picks up a new forwardFromPhonemes / streamFromPhonemes API, letting you bypass the built-in grapheme-to-phoneme pipeline and supply your own IPA strings  –  useful if you need fine-grained control over pronunciation.

Whisper Just Got Faster

Whisper is now up to 3x faster. That alone would be enough, but transcirbe and stream now also return TranscriptionResult objects with word-level timestamps, making it straightforward to build features like subtitle sync or searchable audio.

Better Developer Experience

This has been one of the most requested features: v0.8.0 officially supports bare React Native projects. No more Expo requirements.
 
The resource fetching layer has been refactored into modular adapters – install react-native-executorch-expo-resource-fetcher or react-native-executorch-bare-resource-fetcherdepending on your project type, then initialize before using any hooks.
mport { initExecutorch } from 'react-native-executorch';
import { resourceFetcher } from 'react-native-executorch-bare-resource-fetcher';
initExecutorch(resourceFetcher);
Alongside this, all modules now expose a uniform factory API via fromModelName and fromCustomModelstatic methods, replacing the old new+load pattern. It’s a cleaner, more predictable surface across the board.

Breaking Changes

v0.8.0 includes several breaking changes. Here’s what to update:
  • Initialization is now required – call initExecutorchwith an explicit adapter before using any hook.
  • Factory methods replace constructors – use Module.fromModelName or Module.fromCustomModelinstead of new+load
  • ImageSegmentation → SemanticSegmentation – update imports and hook names to useSemanticSegmentation.
  • Return types have changed Classification.forward now returns a type-safe record of label names to scores. Semantic segmentation returns iRecord<'ARMAX', Int32Array>&Record<K, Float32Array>
  • Speech-to-Text – it returns TranscriptionResult instead of raw strings, and stream is now an async generator.
  • TTS streaming uses a new API – the callbacks pattern is replaced by streamInsert()/streamStop() methods.
  • LLM context management contextWindowLenght is replaced by contextStrategy.

For full details, check out our Realese Notes.

What’s Next

v0.8.0 brings on-device AI in React Native a step forward –  from real-time camera processing to multimodal LLMs to faster speech. Looking into the future we plan on constantly expanding our supported models portfolio, bringing new features and optimizing our library.

We’d love for you to try it out, break things, and tell us what you think. Check out the documentation, star us on GitHub, and come say hi in our community!