AI-Powered Broadcast Intelligence

MAIA (Media AI Agent)

MAIA is Geminisoft's integrated AI engine suite with a purpose-built for broadcasters who need to turn massive media libraries into searchable, reusable, revenue-generating assets. From real-time speech recognition to scene-level video understanding, MAIA eliminates manual bottlenecks across your entire media workflow.

1 On-Premises AI Video Language Model

MAIA Video

Scene Change Detection

SAI detects scene transitions and segments video from frame to shot to scene, generating structured metadata at every level.

Your archive has thousands of hours of footage. MAIA Video watches all of it — so your team doesn't have to.

MAIA Video is an on-premises AI Video Language Model that uses advanced computer vision to analyze every frame, shot, and scene in your media library. It automatically recognizes objects, people, locations, actions, and contextual relationships — then generates rich, searchable metadata at the scene level. No more manual tagging. No more lost footage buried in storage.

With built-in natural language processing (NLP), your editors can search the way they think: type "anchor reporting on wildfire with aerial footage" and MAIA Video finds the exact scenes across your entire archive. Combined with vector database technology for multimodal embedding, it transforms simple storage into an intelligent, AI-searchable media asset.

Object Detection

Identifies 116+ object classes including people, vehicles, animals, and backgrounds with panoptic segmentation and auto-tagging.

Video Summary & Scene Describing

Generative AI automatically produces human-readable descriptions for each scene, ready-to-use as preview notes or editorial references.

Natural Language Search

Search your archive in plain language. MAIA integrates face, object, STT, and scene data for intent-aware, precision retrieval.

🔒 On-Premises deployment — your content never leaves your facility. MAIA Video runs locally with full cloud-hybrid flexibility when needed.

2 STT (Speech-to-Text)

MAIA Speech

Every word spoken on air becomes searchable text — automatically, accurately, and in real time.

MAIA Speech is a multi-engine STT Hub that converts spoken audio into timecode-synchronized text across your entire content library. Rather than locking you into a single vendor, MAIA Speech integrates the industry's leading engines — letting you choose the optimal balance of language support, accuracy, security, and cost for each project.

Speaker diarization automatically identifies who said what, linking each voice segment to the timeline. Editors can search by keyword within transcripts and jump to the exact edit point instantly. AI-powered subtitle editing, auto-summarization, and downloadable caption files are built right in — turning hours of manual transcription into a one-click workflow.

Supported STT Engines

Google Speech-to-Text

Amazon Transcribe

Naver Clova

Open AI Whispear

Daglo

Deployment Options

Cloud-based SaaS with pay-per-use pricing for maximum flexibility — or fully on-premises with Whisper (custom-tuned by Geminisoft) for complete data sovereignty. Hybrid configurations available.

🎁 Face DB included at no extra charge — MAIA Speech comes bundled with face recognition database capabilities, adding visual intelligence to your audio analysis pipeline.

3 Face Recognition

MAIA Face

Find every appearance of any on-screen talent — across your entire archive — in seconds, not hours.

MAIA Face uses deep learning-based facial recognition to automatically detect, extract, and identify every person appearing in your video content. Faces are stored as feature vectors in a dedicated database, enabling instant retrieval by name or by uploading a reference photo. The system builds a growing, expandable talent database — new individuals can be registered at any time, and previously unidentified faces can be retroactively linked.

Each identified person is mapped to a visual timeline showing every appearance with precise timecodes. Editors can jump directly to any scene featuring a specific individual, eliminating the tedious process of manually scrubbing through footage. Whether you're assembling a highlight reel, verifying broadcast compliance, or locating archival appearances — MAIA Face delivers results instantly.

Auto Face Extraction

AI scans video content, detects faces using landmark analysis, and clusters identical individuals, even across different clips and angles.

Image-Based Search

Upload any photo and MAIA instantly finds every matching appearance across your media library, no name or tag required.

Person Timeline Index

Visual timeline shows all appearances per individual with shot-level segments. Click any entry to jump and play instantly.

4 OCR Detection

MAIA Character

Every lower third, every chyron, every on-screen graphic — captured, indexed, and searchable automatically.

MAIA Character applies AI-powered OCR to extract text from subtitles, CG overlays, lower thirds, and any on-screen graphics embedded in your video content. The system operates in two modes: full-frame analysis that scans the entire image for text regions, and precision zone mode that lets operators define specific areas for targeted extraction — ideal for recurring graphic templates.

An advanced preprocessing pipeline — including tilt correction, noise removal, binarization, and contrast enhancement — ensures high recognition accuracy even with complex backgrounds, stylized fonts, and varying color schemes. Multi-language support means MAIA Character works across global content libraries without reconfiguration.

Full Analysis Mode

Automatically detects every text region in the frame, no manual setup needed. Ideal for batch processing large archives where on-screen text positions vary across programs.

Zone Selection Mode

Operators define specific regions of interest with drag-and-drop precision. Only text within designated zones is extracted. Perfect for recurring show formats with fixed CG templates.

🌐 Multi-language & multi-font — stable performance across diverse fonts, color variations, and noise conditions. Works seamlessly with Latin, Chinese, Japanese, Korean, and other character sets.

5 AI-Powered Live Prompting

MAIA Prompter

The prompter that listens. AI matches the anchor's voice to the script in real time — scrolling automatically, hands-free.

MAIA Prompter uses live Speech-to-Text technology to analyze the anchor's voice in real time during broadcast. As the presenter speaks, AI converts their audio to text, matches it against the loaded script, and automatically scrolls the prompter at precisely the right pace — no operator intervention required.

This eliminates the need for a dedicated prompter operator, reduces human error during live broadcasts, and lets anchors deliver news naturally without worrying about scroll speed. The system supports multiple STT engines including Google, NAVER, and Whisper Live (on-premises), giving you the flexibility to choose the best engine for your language and security requirements.

Real-Time Voice Matching

AI continuously analyzes the anchor's speech and matches it against the script — auto-scrolling at the presenter's natural pace.

No Operator Needed

Fully automated prompting eliminates the dedicated operator role, reducing crew costs and removing a single point of failure from live broadcasts.

On-Premises or Cloud

Deploy with Whisper Live for fully on-premises operation, or use cloud engines for maximum accuracy. Your infrastructure, your choice.