Equipment guide · sensor body + brain

Build the body. Then give it a brain.

JARVIS is intentionally split in two: a Raspberry Pi 5 sensor body for presence, and a local GPU brain for cognition, memory, voice, learning, and governance.

Purchase list

Buy the exact body and brain equipment.

These are outbound Amazon links for the hardware stack. The brain machine should be treated as an offline AI processor, not a Windows desktop.

For the Senses

Raspberry Pi sensor body

The physical JARVIS presence: camera, touch display, mic, speaker, Pi 5, and Hailo acceleration.

Vision input

Arducam 16MP Autofocus Camera Module

IMX519 16MP autofocus camera with ABS case for Raspberry Pi models.

View on Amazon

Sensor computer

CanaKit Raspberry Pi 5 16GB Starter Kit PRO

16GB Pi 5 kit with 128GB storage edition for the always-on senses node.

View on Amazon

Presence display

FREENOVE 5 Inch Touchscreen Monitor

800x480 IPS capacitive touchscreen over MIPI DSI for the Pi display surface.

View on Amazon

Voice output

Portable Mini Sound Bar

Compact stereo speaker with enhanced bass for local TTS playback.

View on Amazon

Audio input

USB Gooseneck Microphone

360 degree adjustable USB mic with mute button, LED indicator, and noise-canceling tech.

View on Amazon

Vision acceleration

Raspberry Pi AI HAT+2

Hailo-10H accelerator with 8GB on-board RAM and 40 TOPS class AI capability for Pi 5.

View on Amazon

For the Brain

Offline AI workstation parts

The brain machine runs cognition, memory, models, governance, and self-improvement locally.

CPU backbone

AMD Ryzen 9 desktop CPU

High-thread-count Ryzen 9 class processor for the brain and CPU-resident model lanes.

View on Amazon

Premium GPU brain

NVIDIA GeForce RTX 4080 16GB

Strong baseline GPU for local STT, TTS, LLM residency, vision support, and fast iteration.

View on Amazon

More power

ASUS ROG Astral GeForce RTX 5090 BTF OC

32GB GDDR7 option for heavier local model residency and a bigger extreme-tier brain.

View on Amazon

Complete brain option

Prebuilt workstation path

A ready-to-go tower option if you want the brain hardware assembled first, then reinstalled with Linux.

Prebuilt brain machine

CLX Horus Gaming PC

Ryzen 9 9950X3D, RTX 5080, 96GB DDR5, 4TB NVMe class build. Install Pop!_OS over Windows before using it as the brain.

View on Amazon

Brain OS requirement

Install Pop!_OS over the PC used for the brain. Do not run the brain as a Windows install. Treat that machine as the offline local AI processor for JARVIS.

Sensor body

Pi 5, camera, mic, speaker, display, and Hailo acceleration.

The Pi is the senses. It does not own memory, personality, policy, self-improvement, or the LLM. It captures the room and streams events to the brain.

Thin sensor node

Raspberry Pi 5

Runs the senses layer: camera capture, Hailo vision inference, audio capture/playback, WebSocket transport, and the particle display.

Vision accelerator

Hailo-10H AI HAT+

Runs YOLOv8s person detection, SCRFD face detection, and YOLOv8s-Pose locally on the Pi so the brain receives structured perception.

Vision input

Pi camera

Feeds Picamera2 for person detection, pose, facial expression, face crops, and scene summaries.

Audio input

USB microphone

Captures 44.1kHz audio, resamples to 16kHz int16, and streams raw PCM to the brain over local WebSocket.

Voice output

Speaker

Plays back brain-synthesized TTS audio. The Pi does not run the language or speech intelligence.

Presence surface

7 inch display

Runs the JARVIS particle visualizer in kiosk mode and reflects bounded system state, not private dashboard internals.

Brain equipment

The GPU tier decides model residency, latency, and how much can stay awake.

The brain auto-detects NVIDIA VRAM, CPU threads, and RAM at startup. It then chooses LLM size, STT model, TTS device, vision availability, model keep-alive, and whether ancillary ML should live on CPU or GPU.

Hardware tiers

GPU VRAM selects the brain profile.

The brain auto-detects GPU VRAM at startup and selects model sizes, compute types, and memory strategy from seven tiers. Local-first guarantee: all core capabilities run entirely on local hardware.

TierVRAMLLMFastVisionSTTTTSKeep-alive
minimal<4 GBqwen3:1.7bqwen3:1.7bdisabledtiny/int8none5m
low4-6 GBqwen3:4bqwen3:1.7bdisabledsmall/int8none5m
medium6-8 GBqwen3:8bqwen3:4bdisabledmedium/int8_fp16kokoro_cpu5m
high8-12 GBqwen3:8bqwen3:4bqwen2.5vl:7blarge-v3-turbokokoro_cpu10m
premium12-16.5 GBqwen3:8bqwen3:8bqwen2.5vl:7blarge-v3/int8_fp16kokoro_gpu30m
ultra16.5-24.5 GBqwen3:14bqwen3:8bqwen2.5vl:7blarge-v3/float16kokoro_gpualways
extreme24.5 GB+qwen3:32bqwen3:14bqwen2.5vl:7blarge-v3/float16kokoro_gpualways
Ultra+ tiers pin models in VRAM permanently, eliminating cold-start latency. Premium uses 30m keep-alive. CPU-resident coding LLM runs separately and never touches GPU VRAM.

Self-improvement coder — RAM tiers

The Qwen3-Coder-Next model is selected by system RAM, independent of GPU tier. It runs purely on CPU through llama-server and never contends with VRAM.

System RAMGGUF QuantModel SizeQualityHeadroom
56GB+UD-Q4_K_XL~46GBBest~10GB+ for OS/JARVIS
48-55GBUD-IQ4_XS~38GBGood~10GB+ for OS/JARVIS
32-47GBUD-IQ2_M~25GBAcceptable~7GB+ for OS/JARVIS
<32GBDisabledwould OOMDo not force-enableNot enough RAM

CPU tiers

Strong and beast CPUs can offload ancillary ML from the GPU, freeing VRAM for STT, LLM residency, TTS, and vision.

CPU TierRequirementTypical HardwareEffect
weak<4 threadsSBCs / cheap VPSMinimal CPU headroom
standard4-7 threadsLaptop i5 / older desktopGPU carries ancillary ML when VRAM allows
strong8-15 threads + 8GB RAMDesktop i7 / Ryzen 7Offloads emotion, speaker ID, embeddings, hemispheres to CPU
beast16+ threads + 16GB RAMRyzen 9 / Threadripper / XeonBest partner for premium+ GPUs and coder workflows

Recommended serious build

Pi 5 sensor body + premium GPU brain.

Premium tier is the sweet spot: qwen3:8b warm, large-v3 STT, GPU TTS, speaker ID, emotion, embeddings, policy, memory, and governance without pretending a bigger LLM is free.