Perception.
Synthesized.

A completely autonomous visual navigation matrix. Powered by bespoke YOLOv11seg models, robust Vision-Language Models (VLMs), and real-time haptic translation.

VLM: ACTIVE
YOLOv11-SEG: 120 FPS
HAPTIC: LINKED
> Booting sequence...

System Architecture

Information flows seamlessly from photon to haptic response in under 20 milliseconds.

VIDEO LLM VisionAssist ML Model LLM User asks continuous feedback
SEGMENTATION

YOLOv11 Custom

Proprietary segmentation model trained on high-contrast urban navigation datasets. Extracts pixel-perfect object boundaries instantly with TensorRT acceleration.

SEMANTICS

Vision-Language Synthesis

Multi-modal AI contextualizes the environment. It doesn't just see a "chair"; it understands "an obstacle blocking the immediate path."

TELEMETRY

Spatial Haptics

Translates 3D spatial data into high-fidelity haptic waveforms. Feel the proximity, speed, and dimensions of objects around you via wearable matrices.

Continuous Feedback Loop

At the center of Occulant's architecture is the real-time feedback loop. A stereo VIDEO array feeds continuous optical data to our local VisionAssist edge processor. This low-power module manages raw signal denoising before handing it over to the cloud or local ML Model.

Multi-Stage Perception

Unlike standard detection pipelines, our approach utilizes dual-stage validation. A lightweight LLM runs asynchronously alongside the core detection engine, correcting semantic misclassifications and providing a cohesive situational context (e.g., recognizing that a collection of segmented metal parts is actually a bicycle).

The Final Mile: Interaction

Data from the models must be synthesized predictably. Through spatial sonification and directed haptics, the User node receives continuous feedback without sensory overload. The user directly issues asks directly to VisionAssist (via voice or peripheral triggers), instantly recalibrating the network's priority vectors.

Haptics & Segmentation

Real-time semantic segmentation directs our high-fidelity haptic feedback system, perfectly isolating dynamic obstacles.

Bridging Perception to Sensation

The visual world is hopelessly dense. Occulant extracts the salient features of any scene in real-time, masking out irrelevant background noise and focusing computing power entirely on navigational hazards.

In the demonstrator adjacent, the YOLOv11 Segmentation module precisely masks a pedestrian. The Vision-Language Model interprets this as an immediate trajectory threat and formulates actionable language. This textual vector is instantly converted to a physical map: a rapid pulsing sequence localized on the user's left haptic actuator array, alerting them to slow down.

No screens. No delay. Pure intuition.

SEGMENTATION: ACTIVE
SPATIAL MAP: LINKED
> Keep heading straight.
> Slow down, person ahead.

Live Demo

Experience BlindAssit navigation assistance with real-time YOLO11 detection.

🚀 Quick Start

1

Clone the Repository

git clone https://github.com/manthanabc/BlindAssit.git
2

Install Dependencies

cd BlindAssit && pip install -r requirements.txt
3

Set OpenRouter API Key (Optional)

export OPENROUTER_API_KEY="your-key"
4

Start the Server

./start.sh
5

Open in Browser

http://localhost:5000

✨ Features

  • 🎥 Real-time camera feed
  • 🚶 Sidewalk/obstacle detection
  • 🔊 Voice navigation guidance
  • 🤖 OpenRouter AI queries
  • 📳 Vibration API feedback
  • 🌙 Dark monochrome UI

🛠️ Tech Stack

Flask + YOLO11 + PyTorch + OpenCV + gTTS + OpenRouter AI + JavaScript + Vibration API

BlindAssit in action - Real-time sidewalk navigation