WAIA, the Windows AI Assistant
(A completely cloud-free version which is running the AI models locally is in development)
A voice-controlled AI Assistant for Microsoft Windows
Speech recognition does now detect longer sentences. Groq quota draining has been prevented. Chat history added. It is now possible to do real conversations.
The Groq speech detection is now independent of background noise without speech. The Azure speech recognition has been disabled until quota draining has been repaired.
Please note that WAIA is just a client that unifies different AI services. The program itself does not provide any AI capabilities.
Windows control is done using plugins. So you will need plugins for it.
Only use Plugins from the WAIA GitHub repository. Otherwise, they could contain malware.
The URL of the repository is https://github.com/decipher2k/Windows-AI-Assistant.
Features:
- Seamlessly integrates with ChatGPT and Groq
- Advanced voice recognition powered by Groq
- Voice controlled interaction with Windows
- Plugin system
- Program starter using key sentences
- Webhooks using key sentences for integration with IFTTT (home automation etc.) - https://ifttt.com/
(untested)
Download:
https://github.com/decipher2k/Windows-AI-Assistant/releasesUsage
After starting the application, a tray icon is being added. Doubleclick on it to configure the settings.
Setup the API keys and other informations using the "Settings" buttons.
A green point on the tray icon means that speech has been recorded.
A blue point means that the recorded text has been sent to the Chat AI.
To stop the voice output, right click the tray icon and click on "Cancel".
How to allways show a tray icon:
https://www.lifewire.com/show-or-hide-icons-in-system-tray-in-windows-10-5115219
Chat History:
The chat history contains the last 3 messages. Thus you can really chat with the AI.Keyword:
You can set a custom keyword for starting speech recognition. Default is "Computer".Thus you can say "Computer, who was John F. Kennedy" to get informations about John F. Kennedy.
Keyword Detection:
There is now a keyword detection using Windows Speech Recognition.It can be good to use it in a noisy environment, like when watching TV or listening to music, to prevent speech recognition quota draining.
Keyword detection sets the keyword to "Computer", which can't be changed. Reliability differes between systems.
Recognition quality can be enhaced by training:
https://www.tenforums.com/tutorials/120674-add-delete-change-speech-recognition-profiles-windows-10-a.html
To access the control panel in Windows 11, hit the "Windows" key and enter "control panel".
Windows Sound Recording Level:
If the voice recognition is active too often without you saying anything, or no speech is being detected, you can try to adjust the microphone recording level in the Windows settings.Suggested Services:
I had good experiences with the following setup.All suggested services are available for free.
Voice recognition: Groq.
Chat AI: Groq ist the fastest one.
Voice Output: Windows Speech until Google Cloud AI has been implemented.
Program Starter
The program starter can be configured using the "Commands" button.The first column defines whether to use speech recognition or the chat AI to start the plugin.
Speech recognition will listen for the exact sentence.
Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with "if the user asks for". Thus, the sentence "starting windows explorer" will allow you to say either "start windows explorer", or "run windows explorer" etc.
Chat AI commands have not been implemented yet.
The third column is the program file that should be started.
The fourth column allows you to set command parameters.
Webhooks
Webhooks can be configured using the "Commands" button.They can be used to raise events in webapplications, for example IFTTT. IFTTT can be used to control home automation systems etc.
The first column defines whether to use speech recognition or the chat AI to execute the webhook.
Speech recognition will listen for the exact sentence.
Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with “if the user asks for”. Thus, the sentence “turning on the light” will allow you to say either “turn on the light”, or “switch on the light” etc.
Chat AI commands have not been implemented yet.
The second column defines the sentence that the program uses to recognize the command.
The third column is the URL of the webhook.
The fourth column defines whether to use HTTP POST or HTTP GET. For most webhooks, this will be HTTP GET.
The fifth colummn defines parameters to the webhook.
In case of GET messages, these parameters will be appended to the URL, for example “?light=on” will lead to “https://example.com/webhook?light=on".
In case of POST messages, these parameters will define the data that is being sent with the POST request, for example JSON data.
Plugins
Plugins can be configured using the "Commands" button.The media player plugin is included in the release of the program.
The first column defines whether to use speech recognition or the chat AI to start the plugin.
Speech recognition will listen for the exact sentence.
Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with "if the user asks for". Thus, the sentence "playing media" will allow you to say either "play media", or "play the song" etc.
Chat AI commands have not been implemented yet.
The second column defines the sentence that the program uses to recognize the command.
The third column defines the name of the plugin DLL.
The following columns are there to parametrize the plugin. They do differ from plugin to plugin. Please read the plugin's manual for more information.
Only use Plugins from this GitHub repository. Otherwise, they may containt malware.
The [TEXT] variable
Whenever you enter the token [TEXT] in a parameter of the commands section, the token will be replaced with the text that has been said after the command.For example "Create a note: Shopping" using the key sentence "Create a note: [TEXT]" will pass the word "Shopping" instead of the [TEXT] token to a plugin, a webhook, or a program.
This will only work with Speech Recognition commands, not with Chat AI ones.
Speech recognition
Groq Speech Recognition:
Groq can be found at https://groq.comThe API keys can be created at https://console.groq.com/keys
AI Chat
ChatGPT:
https://medium.com/latinxinai/how-to-get-api-key-for-chat-gpt-3-5-or-4-0-fce40b35aa00You will need ChatGPT API credits, not ChatGPT Plus!
Groq LLM API:
Groq can be found at https://groq.comThe API keys can be created at https://console.groq.com/keys
Speech Synthesis
Elevenlabs:
Log in to your Elevenlabs account.In the top-right corner, click on your profile icon > Profile.
Next to the API Key field, click the eye icon to view and copy your API key and store it in a safe place.
Please note: The "voice" field referes to the name of the voice, not to its ID.
Windows Speech Synthesis:
Average Quality.You may need to set a voice according to your language in the settings.
Costs
Speech Recognition - one of the following:
-Groq (available for free, usage limits, fast)
AI Chat - one of the following:
-ChatGPT (about 10$/month)
-Groq LLM API (available for free, usage limits, fast)
Speech output - one of the following:
-Microsoft Windows Speech (free, average quality)
-Elevenlabs (about 10$-20$/month, good quality)
Please note that prices are dependent on actual usage and may vary.
Writing a plugin
To write a plugin, add "WAIA Plugin.dll" to a new Visual Studio 2022 DotNet 8.0 class library project, implement the interface IWAIAPlugin and the following method:
public String RunPlugin(String text, String[] parameters);
The parameter "String text" is the spoken input.
The parameter "String[] parameters" allows you to pass parameters to the plugin.
The return value of the function will be sent to the speech synthesis engine.
If the third parameter of "String[] parameters" is "AI", the return value of the function will be sent to the Chat AI.
Troubleshooting
If the speech doesn't get detected, try the following:-Adjust the microphone level of Windows
-Enable online speech recognition in Windows settings
-Speak slow and clearly
Planned Features
More Plugins for Windows- Dictation
Sound Volume- Maximizing/Minimizing/Closing windows
- Alt+Tab
- Shutdown/Restart
- Macros
- Claude
- Microsoft Chat AI
- Microsoft Azure Cortex Speech Recognition
- Google Speech Synthesis
- Google AI