Technology

The Technology Behind Nexus Pen: ESP32, BLE, and Real-Time AI

Logan Holby   March 28, 2026   9 min read

Building an AI-powered smart pen sounds impressive. But building one that fits in your hand, runs on battery for hours, delivers real-time AI responses, and plays crisp audio through a speaker no bigger than a shirt button — that's an engineering problem most people underestimate.

Here's an inside look at the hardware and software stack that makes Nexus Pen work.

The Brain: ESP32 Microcontroller

At the heart of Nexus Pen is an ESP32 — a dual-core 240MHz microcontroller from Espressif Systems. The ESP32 is a remarkable chip: it includes built-in Wi-Fi, Bluetooth Classic, and Bluetooth Low Energy (BLE) in a package smaller than a postage stamp.

We chose the ESP32 for three reasons:

  • Power efficiency — BLE mode draws far less current than Wi-Fi, extending battery life during typical use
  • Processing headroom — Dual-core architecture lets one core handle audio while the other manages display and UI
  • Peripheral support — Native I2S audio interface enables high-quality speaker output without external DAC chips

The firmware running on the ESP32 is written in C++ using the Arduino framework, managing BLE connections, the OLED display driver, microphone capture, and real-time audio playback simultaneously.

The Ears: MEMS Microphone

Voice capture uses an I2S MEMS microphone — a digital microphone that interfaces directly with the ESP32's I2S bus. This eliminates the analog noise floor you'd get with a traditional electret microphone and ADC combination.

When you press the talk button, the firmware captures audio at 16kHz mono, applies a simple noise gate to trim silence from the start and end, then packages the audio as raw PCM data for BLE transmission.

The Connection: Bluetooth Low Energy

Getting audio data from the pen to your phone quickly and reliably is one of the hardest parts of the system. Standard BLE has a maximum throughput of roughly 100-150KB/s depending on connection interval. Our audio data needs to transfer fast enough that the AI response can begin processing before you've finished asking your question.

We solved this with a burst transfer protocol: audio is buffered in chunks and transmitted at 24KB/s during active recording, well within BLE headroom. The mobile app receives the chunks in real time and begins streaming them to the backend the moment recording starts — so by the time you stop talking, the AI has often already begun generating a response.

The Intelligence: FastAPI Backend + LLM

Nexus Pen's Donna AI runs on a cloud backend built with FastAPI (Python). The pipeline works like this:

  1. Audio arrives from the mobile app
  2. Speech-to-text transcribes your question
  3. The transcription is sent to an LLM with a mode-specific system prompt (Answer Now, Learn, Research, Creative, or Language)
  4. The LLM streams its response back to the backend
  5. Text-to-speech converts the response to 24kHz audio using HD TTS
  6. Audio is encoded as G.711 u-law and streamed back to the mobile app
  7. The app forwards audio to the pen via BLE for playback

The result is that Donna starts speaking before she's finished generating her full response — progressive streaming audio that feels as responsive as a real conversation.

The Voice: 24kHz HD Audio Pipeline

Audio quality in a pen-sized speaker is a genuine challenge. The speaker we use — the Dayton Audio CE32A-8, a 1.25-inch 8-ohm full-range driver — is exceptional for its size, but it requires clean amplification.

The MAX98357A class-D amplifier handles this. It accepts I2S digital audio directly from the ESP32, eliminating the analog noise path entirely. The result is clean, warm audio at 24kHz sample rate — high enough fidelity that Donna's voice sounds natural and clear, not robotic or compressed.

We implemented runtime volume control stored in ESP32 NVS (non-volatile storage) so your preferred volume persists between sessions. Voice commands like "Donna, lower volume" adjust it hands-free.

The Display: 1.3-Inch OLED

The OLED display is driven over I2C — a two-wire bus that shares the ESP32's SDA and SCL pins with the display controller. At 128x64 pixels, the display is small but sharp: white pixels on a true black background with excellent contrast in any lighting condition.

The display shows streaming text from Donna's response, scrolling in real time as audio plays. It also shows volume level, battery status, and mode indicators — making it a genuine information surface, not just a status LED.

Why This Architecture Matters

Every component in Nexus Pen was chosen to serve the core promise: instant AI answers without picking up your phone. The ESP32 keeps power consumption manageable. BLE burst transfer keeps latency low. The FastAPI streaming backend keeps responses progressive. The MAX98357A keeps audio clean.

None of it is magic — it's careful systems design applied to a form factor nobody had tried before. And that's what makes Nexus Pen the most technically capable AI writing tool available today.

Interested in the engineering? Nexus Pen is available now.

Order Nexus Pen — $119
Back to Blog