AI

Building an AI-Powered Note-Taking App in React Native — Part 2: Image Semantic Search

Jakub MrozNov 13, 20257 min read

In the previous part of this series, we added semantic text search to our React Native note-taking app, letting users find notes by meaning, not just keywords.

Now, we’ll take it a step further. In this part, we’ll extend that capability to images — so you can find images in you notes using text queries or even other images.

Project overview

We’ll continue building on the same Expo note-taking app from Part 1. If you want to follow along, start from the “text-semantic-search” branch in this repository.

The project has the following structure:

app/
  _layout.tsx           # App navigation
  index.tsx             # App entry point
  notes.tsx             # Notes list screen
  note/
    [id].tsx            # Note editor screen

services/
  notesService.ts       # Handles note creation, updates, and deletion
  storage/
    notes.ts            # Manages local data storage (via AsyncStorage)
  vectorStores/
    textVectorStore.ts  # Text embeddings + vector store

types/
  note.ts               # Type definitions for Note objects

constants/
  theme.ts              # App theme configuration

We’ll add multimodal semantic search functionality — powered by on-device AI models — without needing any backend.

What is multimodal semantic search?

An embedding is a numerical representation of content — a dense vector that captures the semantic meaning of text, images, or other data.

  • For text embeddings, words or sentences with similar meaning have similar vectors.
  • For image embeddings, visually or conceptually similar images produce similar vectors.
  • Multimodal models like CLIP map both images and text into the same space — so “dog playing fetch” and a photo of a dog fetching a ball end up close together in the embedding space.

This shared understanding allows semantic search across different data types: a key step in building intelligent AI-powered note apps.

Image embedding model

We’ll use CLIP (text + image encoders). It maps images and text into the same vector space, enabling cross-modal retrieval. The model is compact, performs well on-device.

Similarly to part one, we‘ll use React Native ExecuTorch to run the model, and React Native RAG to search for similar results.

Packages

To enable semantic search implementation in React Native, install:

Integration steps

1. Create the image vector store

We’ll set up a local vector database for image embeddings using CLIP’s image and text encoders. This lets you run image-to-image and text-to-image search entirely on-device.

Import CLIP model and vector store connectors.

// services/vectorStores/imageVectorStore.ts

import { ExecuTorchEmbeddings } from "@react-native-rag/executorch";
import { OPSQLiteVectorStore } from "@react-native-rag/op-sqlite";
import {
  CLIP_VIT_BASE_PATCH32_TEXT,
  ImageEmbeddingsModule,
} from "react-native-executorch";

Initialize the CLIP image embedding model.

// services/vectorStores/imageVectorStore.ts

const imageEmbeddings = new ImageEmbeddingsModule();
export { imageEmbeddings };

Create a persistent image vector store for embeddings.

// services/vectorStores/imageVectorStore.ts

export const imageVectorStore = new OPSQLiteVectorStore({
  name: "notes_image_vector_store",
  embeddings: new ExecuTorchEmbeddings(CLIP_VIT_BASE_PATCH32_TEXT),
});

2. Index notes on create, update and delete

Each time a note changes, we update both text and image embeddings so semantic results stay accurate.

Import text and image vector store utilities:

// services/notesService.ts

import {
  textSplitter,
  noteToString,
  textVectorStore,
} from "@/services/vectorStores/textVectorStore";
import {
  imageEmbeddings,
  imageVectorStore,
} from "@/services/vectorStores/imageVectorStore";

This function saves the note locally, splits its text into smaller chunks for embedding, adds those text embeddings to the text vector store, and then processes each image to generate and store image embeddings for multimodal search.

async function createNote(
  title: string,
  content: string,
  imageUris: string[],
): Promise<Note> {
  const note = await storageCreateNote({ title, content, imageUris });
  const chunks = await textSplitter.splitText(noteToString(note));
  for (const chunk of chunks) {
    await textVectorStore.add({
      document: chunk,
      metadata: { noteId: note.id },
    });
  }
  for (const uri of imageUris) {
    const embedding = Array.from(await imageEmbeddings.forward(uri));
    await imageVectorStore.add({
      embedding,
      metadata: { imageUri: uri, noteId: note.id },
    });
  }
  return note;
}

Update a note by removing old embeddings, re-splitting text, and re-indexing both text and image embeddings:

// services/notesService.ts

async function updateNote(
  noteId: string,
  data: { title: string; content: string; imageUris: string[] },
): Promise<void> {
  await storageUpdateNote(noteId, data);

  await textVectorStore.delete({
    predicate: (r) => r.metadata?.noteId === noteId,
  });
  await imageVectorStore.delete({
    predicate: (r) => r.metadata?.noteId === noteId,
  });

  const chunks = await textSplitter.splitText(noteToString(data));
  for (const chunk of chunks) {
    await textVectorStore.add({ document: chunk, metadata: { noteId } });
  }

  for (const uri of data.imageUris) {
    const embedding = Array.from(await imageEmbeddings.forward(uri));
    await imageVectorStore.add({
      embedding,
      metadata: { imageUri: uri, noteId },
    });
  }
}

Delete a note and its associated embeddings:

// services/notesService.ts

async function deleteNote(noteId: string): Promise<void> {
  await FileSystem.deleteAsync(
    FileSystem.documentDirectory + `notes/${noteId}`,
    { idempotent: true },
  );
  await storageDeleteNote(noteId);
  await textVectorStore.delete({
    predicate: (r) => r.metadata?.noteId === noteId,
  });
  await imageVectorStore.delete({
    predicate: (r) => r.metadata?.noteId === noteId,
  });
}

Next, it’s time for the helper function:

// services/notesService.ts

function buildSimilarityResults(
  results: { similarity: number; metadata?: { noteId?: string } }[],
  notes: Note[],
): Note[] {
  const noteIdToMaxSimilarity = new Map<string, number>();
  for (const r of results) {
    const noteId = r.metadata?.noteId;
    if (noteId) {
      const current = noteIdToMaxSimilarity.get(noteId) ?? -Infinity;
      noteIdToMaxSimilarity.set(noteId, Math.max(current, r.similarity));
    }
  }
  return notes
    .filter((n) => noteIdToMaxSimilarity.has(n.id))
    .map((n) => ({ ...n, similarity: noteIdToMaxSimilarity.get(n.id)! }))
    .sort((a, b) => b.similarity - a.similarity);
}

Search for images using text queries:

// services/notesService.ts

async function searchImagesByText(
  query: string,
  notes: Note[],
  n: number = 3,
): Promise<Note[]> {
  const results = await imageVectorStore.query({ queryText: query.trim() });
  return buildSimilarityResults(results, notes).slice(0, n);
}

Search for similar images using another image:

// services/notesService.ts

async function searchByImageUri(
  imageUri: string,
  notes: Note[],
  n: number = 3,
): Promise<Note[]> {
  const imageEmbedding = Array.from(await imageEmbeddings.forward(imageUri));
  const results = await imageVectorStore.query({
    queryEmbedding: imageEmbedding,
  });
  return buildSimilarityResults(results, notes).slice(0, n);
}

3. Load both vector stores at app start

We initialize both the text and image vector stores, so search is ready as soon as the app launches.

Import the image vector store:

// app/index.tsx

import {
  imageEmbeddings,
  imageVectorStore,
} from "@/services/vectorStores/imageVectorStore";
import { CLIP_VIT_BASE_PATCH32_IMAGE } from "react-native-executorch";

Initialize vector stores and show a loading indicator until ready:

// app/index.tsx

export default function Index() {
  const [isLoaded, setIsLoaded] = useState(false);

  useEffect(() => {
    (async () => {
      try {
        await textVectorStore.load();
        await imageVectorStore.load();
        await imageEmbeddings.load(CLIP_VIT_BASE_PATCH32_IMAGE);
        setIsLoaded(true);
      } catch (e) {
        console.error('Vector stores failed to load', e);
      }
    })();
  }, [])

  return isLoaded ? <Notes /> : <ActivityIndicator />;
}

4. Usage

You can now call the semantic search methods from anywhere in your app. Each function returns an array of notes, sorted by semantic similarity to the query:

// app/notes.tsx

try {
  const result = await notesService.searchImagesByText(query, notes);
} catch (e) {
  console.error("Failed to search by text", e);
}

try {
  const results = await notesService.searchByImageUri(imageUri, notes);
} catch (e) {
  console.error("Failed to search by image", e);
}

Results

Your AI note-taking app now understands both language and visuals. That’s the power of multimodal semantic search embeddings — retrieving content by meaning instead of exact matches.

Check out our GitHub and try it in your project.

What’s coming next to our AI note taking app?

Stay tuned, because in Part 3, we’ll add on-device Retrieval-Augmented Generation (RAG) to answer questions about your notes — still fully local.

We are Software Mansion — multimedia experts, AI explorers, React Native core contributors, community builders, and software development consultants.

We can help you build your next dream product — hire us.

More in this category