Live API
Connect native MCP client to Gemini Live API. Real-time voice AI for Model Context Protocol on iOS and Android mobile devices.
Gemini Voice Integration
systemprompt connects to Google's Gemini Live API via HTTP streaming for real-time voice interactions. This powerful integration enables natural conversations with AI while leveraging your configured MCP tools.
Overview
The Live API connection transforms systemprompt into a sophisticated voice assistant that can understand context, execute tools, and provide intelligent responses. The streaming connection ensures low latency and real-time communication.
Key Benefits
- Natural Conversations: Speak naturally without rigid commands
- Context Awareness: AI remembers conversation context
- Tool Integration: Seamlessly uses your 20 selected tools
- Real-time Response: Instant feedback and processing
- Continuous Listening: Extended voice sessions supported
Technical Architecture
Streaming Connection
The Live API uses HTTP streaming for real-time communication:
Connection Flow
- Establish streaming connection
- Authenticate with API key
- Configure audio parameters
- Begin streaming audio
- Receive AI responses
Audio Streaming
Outgoing Audio
- Format: 16-bit PCM
- Sample Rate: 16kHz
- Channels: Mono
- Encoding: Base64
- Chunk Size: Optimized for mobile
Incoming Responses
- Text transcriptions
- Tool execution requests
- Audio responses (future)
- Status updates
- Error messages
Setting Up the Connection
Prerequisites
Before connecting to Live API:
- Active Subscription: systemprompt Pro required
- Internet Connection: Stable WiFi recommended
- Microphone Access: Permission granted
- Selected Tools: 20 tools configured
- API Availability: Service must be accessible
Initial Configuration
The connection is automatic, but you can verify:
- Check Settings → Advanced
- View "Live API Status"
- Verify "Connected"
- Test with voice command
Connection Parameters
systemprompt configures optimal parameters:
Voice Interaction Flow
Starting a Session
- Tap microphone in Conversation screen
- Streaming connects automatically
- Audio streaming begins
- Speak your request
- AI processes in real-time
During Interaction
The Live API handles:
- Speech recognition: Converts voice to text
- Intent understanding: Determines what you want
- Tool selection: Chooses appropriate MCP tools
- Parameter extraction: Gets values from speech
- Execution coordination: Runs tools as needed
- Response generation: Creates natural replies
Session Management
Automatic Handling
- Connection maintained during use
- Idle timeout after inactivity
- Automatic reconnection
- State preservation
- Error recovery
Manual Control
- Tap to start/stop
- Long press for extended
- Swipe to cancel
- Settings for behavior
Tool Integration
How Tools Work with Live API
The AI seamlessly integrates your 20 selected tools:
-
Understanding Intent
-
Tool Selection
-
Execution
-
Natural Response
Tool Availability
The Live API can only access your selected 20 tools:
- Importance of curation: Choose wisely
- Profile switching: Change tool sets
- Context awareness: AI knows available tools
- Graceful handling: Clear message if tool unavailable
Advanced Features
Continuous Conversation
Unlike traditional voice assistants:
- Context retention: Remembers previous exchanges
- Follow-up questions: Natural progression
- Clarification: Asks when unclear
- Multi-turn workflows: Complex operations
Intelligent Processing
The AI provides:
Smart Interpretation
- Understands variations
- Handles ambiguity
- Suggests alternatives
- Corrects mistakes
Proactive Assistance
- Suggests next steps
- Offers related info
- Prevents errors
- Optimizes workflow
Error Handling
Robust error management:
- Network issues: Automatic retry
- API limits: Graceful degradation
- Tool failures: Alternative suggestions
- Unclear speech: Requests clarification
Performance Optimization
Network Requirements
For best performance:
- WiFi recommended: Lower latency
- Mobile data (4G/5G) supported: Mobile data works
- Bandwidth: ~50 kbps sustained
- Latency: Less than 200ms optimal
- Stability: Consistent connection
Audio Quality
Optimize voice input:
- Quiet environment: Reduce background noise
- Clear speech: Normal pace and volume
- Proper distance: 6-12 inches from device
- Avoid interruptions: Complete thoughts
Response Time
Factors affecting speed:
- Network latency: Primary factor
- Tool complexity: Simple tools faster
- Request clarity: Clear requests process faster
- Server load: Peak times may be slower
Privacy & Security
Data Handling
Your privacy is protected:
Audio Processing
- Streaming only during use
- No persistent recording
- Encrypted transmission
- No local storage
Conversation Data
- Processed by Google AI
- Not stored permanently
- Used only for response
- No training on your data
Security Measures
- TLS encryption: All communication
- Authentication: API key required
- Access control: Your tools only
- Audit logging: Track usage
- Data isolation: Per-user separation
Troubleshooting
Common Issues
"Connection failed"
- Check internet connection
- Verify subscription active
- Restart app
- Check service status
"Poor recognition"
- Reduce background noise
- Speak more clearly
- Check microphone
- Move to quiet area
"Slow responses"
- Check network speed
- Try WiFi instead of mobile
- Reduce concurrent apps
- Contact support
Debug Information
Access diagnostic data:
- Settings → Advanced
- "Live API Diagnostics"
- View connection stats
- Export debug logs
- Share with support
Best Practices
Effective Communication
- Be specific: Include relevant details
- One request: Avoid multiple tasks at once
- Use context: Reference previous messages
- Natural language: No need for keywords
- Confirm actions: For critical operations
Optimal Usage
- Prepare mentally: Know what you want
- Speak completely: Finish thoughts
- Listen fully: Let AI complete responses
- Iterate naturally: Build on responses
- Learn patterns: What works best
Common Patterns
Information Gathering
Action Execution
Complex Workflows
Future Enhancements
Planned Features
- Multi-language support: Beyond English
- Voice responses: Audio feedback
- Custom wake words: Hands-free activation
- Offline capability: Basic functions without internet
- Advanced context: Longer conversation memory
Connect to the future of voice-controlled development!