Logo

Live API

Connect native MCP client to Gemini Live API. Real-time voice AI for Model Context Protocol on iOS and Android mobile devices.

Gemini Voice Integration

systemprompt connects to Google's Gemini Live API via HTTP streaming for real-time voice interactions. This powerful integration enables natural conversations with AI while leveraging your configured MCP tools.

Overview

The Live API connection transforms systemprompt into a sophisticated voice assistant that can understand context, execute tools, and provide intelligent responses. The streaming connection ensures low latency and real-time communication.

Key Benefits

  • Natural Conversations: Speak naturally without rigid commands
  • Context Awareness: AI remembers conversation context
  • Tool Integration: Seamlessly uses your 20 selected tools
  • Real-time Response: Instant feedback and processing
  • Continuous Listening: Extended voice sessions supported

Technical Architecture

Streaming Connection

The Live API uses HTTP streaming for real-time communication:

https://generativelanguage.googleapis.com/v1/models/gemini-live:streamGenerateContent

Connection Flow

  1. Establish streaming connection
  2. Authenticate with API key
  3. Configure audio parameters
  4. Begin streaming audio
  5. Receive AI responses

Audio Streaming

Outgoing Audio

  • Format: 16-bit PCM
  • Sample Rate: 16kHz
  • Channels: Mono
  • Encoding: Base64
  • Chunk Size: Optimized for mobile

Incoming Responses

  • Text transcriptions
  • Tool execution requests
  • Audio responses (future)
  • Status updates
  • Error messages

Setting Up the Connection

Prerequisites

Before connecting to Live API:

  1. Active Subscription: systemprompt Pro required
  2. Internet Connection: Stable WiFi recommended
  3. Microphone Access: Permission granted
  4. Selected Tools: 20 tools configured
  5. API Availability: Service must be accessible

Initial Configuration

The connection is automatic, but you can verify:

  1. Check Settings → Advanced
  2. View "Live API Status"
  3. Verify "Connected"
  4. Test with voice command

Connection Parameters

systemprompt configures optimal parameters:

{
  "model": "gemini-live",
  "audio_config": {
    "sample_rate": 16000,
    "encoding": "PCM_16BIT",
    "channels": 1
  },
  "generation_config": {
    "temperature": 0.7,
    "candidate_count": 1
  },
  "safety_settings": "BLOCK_NONE",
  "tools": "[your 20 selected tools]"
}

Voice Interaction Flow

Starting a Session

  1. Tap microphone in Conversation screen
  2. Streaming connects automatically
  3. Audio streaming begins
  4. Speak your request
  5. AI processes in real-time

During Interaction

The Live API handles:

  • Speech recognition: Converts voice to text
  • Intent understanding: Determines what you want
  • Tool selection: Chooses appropriate MCP tools
  • Parameter extraction: Gets values from speech
  • Execution coordination: Runs tools as needed
  • Response generation: Creates natural replies

Session Management

Automatic Handling

  • Connection maintained during use
  • Idle timeout after inactivity
  • Automatic reconnection
  • State preservation
  • Error recovery

Manual Control

  • Tap to start/stop
  • Long press for extended
  • Swipe to cancel
  • Settings for behavior

Tool Integration

How Tools Work with Live API

The AI seamlessly integrates your 20 selected tools:

  1. Understanding Intent

    You: "Check my pull requests"
    AI: Recognizes need for GitHub tool
  2. Tool Selection

    AI: Selects "List Pull Requests" tool
    AI: Extracts any parameters needed
  3. Execution

    System: Executes MCP tool
    System: Returns results to AI
  4. Natural Response

    AI: "You have 3 open pull requests..."

Tool Availability

The Live API can only access your selected 20 tools:

  • Importance of curation: Choose wisely
  • Profile switching: Change tool sets
  • Context awareness: AI knows available tools
  • Graceful handling: Clear message if tool unavailable

Advanced Features

Continuous Conversation

Unlike traditional voice assistants:

  • Context retention: Remembers previous exchanges
  • Follow-up questions: Natural progression
  • Clarification: Asks when unclear
  • Multi-turn workflows: Complex operations

Intelligent Processing

The AI provides:

Smart Interpretation

  • Understands variations
  • Handles ambiguity
  • Suggests alternatives
  • Corrects mistakes

Proactive Assistance

  • Suggests next steps
  • Offers related info
  • Prevents errors
  • Optimizes workflow

Error Handling

Robust error management:

  • Network issues: Automatic retry
  • API limits: Graceful degradation
  • Tool failures: Alternative suggestions
  • Unclear speech: Requests clarification

Performance Optimization

Network Requirements

For best performance:

  • WiFi recommended: Lower latency
  • Mobile data (4G/5G) supported: Mobile data works
  • Bandwidth: ~50 kbps sustained
  • Latency: Less than 200ms optimal
  • Stability: Consistent connection

Audio Quality

Optimize voice input:

  • Quiet environment: Reduce background noise
  • Clear speech: Normal pace and volume
  • Proper distance: 6-12 inches from device
  • Avoid interruptions: Complete thoughts

Response Time

Factors affecting speed:

  • Network latency: Primary factor
  • Tool complexity: Simple tools faster
  • Request clarity: Clear requests process faster
  • Server load: Peak times may be slower

Privacy & Security

Data Handling

Your privacy is protected:

Audio Processing

  • Streaming only during use
  • No persistent recording
  • Encrypted transmission
  • No local storage

Conversation Data

  • Processed by Google AI
  • Not stored permanently
  • Used only for response
  • No training on your data

Security Measures

  • TLS encryption: All communication
  • Authentication: API key required
  • Access control: Your tools only
  • Audit logging: Track usage
  • Data isolation: Per-user separation

Troubleshooting

Common Issues

"Connection failed"

  • Check internet connection
  • Verify subscription active
  • Restart app
  • Check service status

"Poor recognition"

  • Reduce background noise
  • Speak more clearly
  • Check microphone
  • Move to quiet area

"Slow responses"

  • Check network speed
  • Try WiFi instead of mobile
  • Reduce concurrent apps
  • Contact support

Debug Information

Access diagnostic data:

  1. Settings → Advanced
  2. "Live API Diagnostics"
  3. View connection stats
  4. Export debug logs
  5. Share with support

Best Practices

Effective Communication

  1. Be specific: Include relevant details
  2. One request: Avoid multiple tasks at once
  3. Use context: Reference previous messages
  4. Natural language: No need for keywords
  5. Confirm actions: For critical operations

Optimal Usage

  • Prepare mentally: Know what you want
  • Speak completely: Finish thoughts
  • Listen fully: Let AI complete responses
  • Iterate naturally: Build on responses
  • Learn patterns: What works best

Common Patterns

Information Gathering

"What errors occurred overnight?"
"Show me details for the payment error"
"How many users were affected?"

Action Execution

"Create an issue for this bug"
"Merge pull request 456"
"Deploy to staging environment"

Complex Workflows

"Check if PR 123 passed tests, and if so, merge it"
"Find all critical errors and create issues for them"
"Review my PRs and summarize the feedback"

Future Enhancements

Planned Features

  • Multi-language support: Beyond English
  • Voice responses: Audio feedback
  • Custom wake words: Hands-free activation
  • Offline capability: Basic functions without internet
  • Advanced context: Longer conversation memory

Connect to the future of voice-controlled development!

On this page