Connect native MCP client to Gemini Live API. Real-time voice AI for Model Context Protocol on iOS and Android mobile devices.

Gemini Voice Integration

systemprompt connects to Google's Gemini Live API via HTTP streaming for real-time voice interactions. This powerful integration enables natural conversations with AI while leveraging your configured MCP tools.

Overview

The Live API connection transforms systemprompt into a sophisticated voice assistant that can understand context, execute tools, and provide intelligent responses. The streaming connection ensures low latency and real-time communication.

Key Benefits

Natural Conversations: Speak naturally without rigid commands
Context Awareness: AI remembers conversation context
Tool Integration: Seamlessly uses your 20 selected tools
Real-time Response: Instant feedback and processing
Continuous Listening: Extended voice sessions supported

Technical Architecture

Streaming Connection

The Live API uses HTTP streaming for real-time communication:

https://generativelanguage.googleapis.com/v1/models/gemini-live:streamGenerateContent

Connection Flow

Establish streaming connection
Authenticate with API key
Configure audio parameters
Begin streaming audio
Receive AI responses

Audio Streaming

Outgoing Audio

Format: 16-bit PCM
Sample Rate: 16kHz
Channels: Mono
Encoding: Base64
Chunk Size: Optimized for mobile

Incoming Responses

Text transcriptions
Tool execution requests
Audio responses (future)
Status updates
Error messages

Setting Up the Connection

Prerequisites

Before connecting to Live API:

Active Subscription: systemprompt Pro required
Internet Connection: Stable WiFi recommended
Microphone Access: Permission granted
Selected Tools: 20 tools configured
API Availability: Service must be accessible

Initial Configuration

The connection is automatic, but you can verify:

Check Settings → Advanced
View "Live API Status"
Verify "Connected"
Test with voice command

Connection Parameters

systemprompt configures optimal parameters:

{
  "model": "gemini-live",
  "audio_config": {
    "sample_rate": 16000,
    "encoding": "PCM_16BIT",
    "channels": 1
  },
  "generation_config": {
    "temperature": 0.7,
    "candidate_count": 1
  },
  "safety_settings": "BLOCK_NONE",
  "tools": "[your 20 selected tools]"
}

Voice Interaction Flow

Starting a Session

Tap microphone in Conversation screen
Streaming connects automatically
Audio streaming begins
Speak your request
AI processes in real-time

During Interaction

The Live API handles:

Speech recognition: Converts voice to text
Intent understanding: Determines what you want
Tool selection: Chooses appropriate MCP tools
Parameter extraction: Gets values from speech
Execution coordination: Runs tools as needed
Response generation: Creates natural replies

Session Management

Automatic Handling

Connection maintained during use
Idle timeout after inactivity
Automatic reconnection
State preservation
Error recovery

Manual Control

Tap to start/stop
Long press for extended
Swipe to cancel
Settings for behavior

Tool Integration

How Tools Work with Live API

The AI seamlessly integrates your 20 selected tools:

Understanding Intent
You: "Check my pull requests" AI: Recognizes need for GitHub tool
Tool Selection
AI: Selects "List Pull Requests" tool AI: Extracts any parameters needed
Execution
System: Executes MCP tool System: Returns results to AI
Natural Response
AI: "You have 3 open pull requests..."

Tool Availability

The Live API can only access your selected 20 tools:

Importance of curation: Choose wisely
Profile switching: Change tool sets
Context awareness: AI knows available tools
Graceful handling: Clear message if tool unavailable

Advanced Features

Continuous Conversation

Unlike traditional voice assistants:

Context retention: Remembers previous exchanges
Follow-up questions: Natural progression
Clarification: Asks when unclear
Multi-turn workflows: Complex operations

Intelligent Processing

The AI provides:

Smart Interpretation

Understands variations
Handles ambiguity
Suggests alternatives
Corrects mistakes

Proactive Assistance

Suggests next steps
Offers related info
Prevents errors
Optimizes workflow

Error Handling

Robust error management:

Network issues: Automatic retry
API limits: Graceful degradation
Tool failures: Alternative suggestions
Unclear speech: Requests clarification

Performance Optimization

Network Requirements

For best performance:

WiFi recommended: Lower latency
Mobile data (4G/5G) supported: Mobile data works
Bandwidth: ~50 kbps sustained
Latency: Less than 200ms optimal
Stability: Consistent connection

Audio Quality

Optimize voice input:

Quiet environment: Reduce background noise
Clear speech: Normal pace and volume
Proper distance: 6-12 inches from device
Avoid interruptions: Complete thoughts

Response Time

Factors affecting speed:

Network latency: Primary factor
Tool complexity: Simple tools faster
Request clarity: Clear requests process faster
Server load: Peak times may be slower

Privacy & Security

Data Handling

Your privacy is protected:

Audio Processing

Streaming only during use
No persistent recording
Encrypted transmission
No local storage

Conversation Data

Processed by Google AI
Not stored permanently
Used only for response
No training on your data

Security Measures

TLS encryption: All communication
Authentication: API key required
Access control: Your tools only
Audit logging: Track usage
Data isolation: Per-user separation

Troubleshooting

Common Issues

"Connection failed"

Check internet connection
Verify subscription active
Restart app
Check service status

"Poor recognition"

Reduce background noise
Speak more clearly
Check microphone
Move to quiet area

"Slow responses"

Check network speed
Try WiFi instead of mobile
Reduce concurrent apps
Contact support

Debug Information

Access diagnostic data:

Settings → Advanced
"Live API Diagnostics"
View connection stats
Export debug logs
Share with support

Best Practices

Effective Communication

Be specific: Include relevant details
One request: Avoid multiple tasks at once
Use context: Reference previous messages
Natural language: No need for keywords
Confirm actions: For critical operations

Optimal Usage

Prepare mentally: Know what you want
Speak completely: Finish thoughts
Listen fully: Let AI complete responses
Iterate naturally: Build on responses
Learn patterns: What works best

Common Patterns

Information Gathering

"What errors occurred overnight?"
"Show me details for the payment error"
"How many users were affected?"

Action Execution

"Create an issue for this bug"
"Merge pull request 456"
"Deploy to staging environment"

Complex Workflows

"Check if PR 123 passed tests, and if so, merge it"
"Find all critical errors and create issues for them"
"Review my PRs and summarize the feedback"

Future Enhancements

Planned Features

Multi-language support: Beyond English
Voice responses: Audio feedback
Custom wake words: Hands-free activation
Offline capability: Basic functions without internet
Advanced context: Longer conversation memory

Connect to the future of voice-controlled development!

Live API

On this page