Text-to-Mic is an open-source, free text-to-speech and speech-to-text-to-speech (TTS and STTTS) to-microphone tool that turns typed text into speech audio with AI and then plays that audio to your speakers, headset, or microphone feed.
Here is a video example of how it looks when running on Windows:
This is perfect to enable you to speak in online video meetings using text-to-speech AI. It can also manipulate text with AI in real-time which has lots of practical uses, such as tidying up speech or live translation. (See download links below).
Text-to-Mic uses the OpenAI text-to-speech engine, which surpasses the standard text-to-speech tools available on Windows and Mac. This app is available to use for free.
- Seamless Text-to-Speech-to-Microphone (or speakers) Conversion:
Utilizes OpenAI's API to convert text into natural-sounding speech in real-time. - Multiple Voices:
Choose from a variety of OpenAI voices to find the tone that best suits your presentation or meeting style. Supported voices: Alloy, Echo, Fable, Onyx, Nova, Shimmer (Listen to samples). - Dual Output Capability:
Outputs audio simultaneously to both headphones and a virtual microphone, ensuring you can monitor and share your presentation effectively. - STTTS - Speech-to-text-to-speech capabilities.
Record your voice, even if you are struggling to speak, which saves as text, which you can then immediately playback over the selected audio feeds. - Hotkeys for Quick Access
Trigger speech recording, conversion and playback using hotkeys (like ctrl+shift+0) to make using Text-to-mic feel more natural, quick and seamless. - Automatic ChatGPT AI text Manipulation
This allows you to automatically translate what you've typed or recorded into another language, or automatically manipulate the input text in some desired way, speeding up the communications process
Watch the video above to see the power of the AI-enabled Text-to-Mic in action!
If you like this tool, we also have a free speech-to-copy-edited-text desktop app which you might be interested in, which runs in the background and allows for rapid conversion of spoken word to AI transcribed and copy edited text, pasted directly into your active application.
Download
For Windows
- Download v1.2.0 for Windows (38MB EXE) Latest
- Download v1.2.0 for Windows (38MB ZIP) Latest
- Download v1.0.8 for Windows (38MB EXE)
- Download v1.0.8 for Windows (38MB ZIP)
- Download v1.0.7 for Windows (38MB EXE)
- Download v1.0.7 for Windows (38MB ZIP)
- Download v1.0.6 for Windows (29MB EXE)
- Download v1.0.6 for Windows (29MB ZIP)
- Download v1.0.5 for Windows (29MB EXE)
- Download v1.0.5 for Windows (29MB ZIP)
- Download v1.0.4 for Windows (29MB EXE)
- Download v1.0.4 for Windows (29MB ZIP)
- Download v1.0.3 for Windows (29MB EXE)
- Download v1.0.3 for Windows (29MB ZIP)
For Mac
Text to Mic is Open Source! View the source code on GitHub.
You will need to download, extract, and then run the .app file
Getting Started
- Install VB-Cable
Install VB-Cable from https://vb-audio.com/Cable/ if you haven't already. This tool creates a virtual microphone on your Windows computer or Mac. Once installed, you can trigger audio to play through this virtual cable. - Add an OpenAI API Key
Open the Text-to-Mic app by Scorchsoft and input your OpenAPI key (Tutorial video on setting up an API Key).
If you don't yet have an API key, visit platform.openai.com, sign up for a free account, set up billing and add some credit, generate an API Key, and copy that key into text-to-mic.
(It's not that expensive but OpenAI will bill you for text-to-speech generation - see pricing, see the text-to-speech and speech-to-text pricing, as well as GPT models if you enable AI manipulation) - Set voice
Select your preferred voice for speech synthesis in the app UI. - Choose playback devices
Choose a playback device. I recommend selecting your headphones as one device and the virtual microphone (usually labelled "Cable Input (VB-Audio)") as the other. - Set Microphone to Cable Input VB-Audio in an online meeting
When you join a meeting on platforms like Teams, Zoom, or Google Meets, select the Cable Input audio channel in the meeting tool's settings. This will play back any audio submitted via the tool when you hit play. However, please be aware that your own microphone will not function simultaneously. You will need to switch back if you need to speak.
Example of virtual microphone selection in Google Meet: - Type
Enter the text you want to convert to speech in the provided text area. - Play
Click 'Play Audio' to listen to the spoken version of your text. This replays the previously generated audio clip to prevent unnecessary use of your OpenAI API Key. - Repeat what you said last
Use the 'Play Last Audio' button to replay the last generated speech output. - Housekeeping
You can change the API key at any time under the 'Settings' menu. - Experiment with AI manipulation
Play with the settings in "Settings > ChatGPT Manipulation" to automatically use AI to translate, change, or enhance recorded or spoken words. Useful for expanding on paraphrased content to increase the speed you can communicate, or reduce vocal strain.
If you go to "Settings > ChatGPT Manipulation" then you can turn this on and pick which model to use.
If enabled (both enabled and "auto apply to recorded transcript"), this will run your transcript through AI with the desired prompt each time you record your voice and convert it to text.
If you've enabled but not turned on auto apply, then you can manually trigger this action to any text you've input into "text to Read" via the context menu "Input > Apply AI manipulation to text input". This will only work if you've turned it on and added your API key
You can use a hotkey combination to trigger recording and playing of recorded text quickly. By default, the hotkeys are "ctrl+shift+0" to start the recording, then press it again to stop, transcribe, and submit. "Ctrl+shift+9" stops the recording without playing it. "Ctrl+shift+8" replays the last transcribed or written text.
"Settings > Hotkey Settings" allows you to customise the hotkey combinations used to trigger the above actions.
Click the presets button at the bottom of the app to open the presets area. You can then click a preset to automatically add it to the "Text to Read" section or double-click it to immediately play it back.
Once loaded for the first time, presets are stored in "config/presets.json." This means that if you close the app, you can edit them and add categories, etc., via Notepad. If you do this, please make sure you don't break or invalidate the JSON structure.
You can also edit presets from within the app, but this is limited to saving new presets to an existing category, favouriting presets, and deleting them. Any other edits must be completed by editing the JSON file.
You can add a new preset by writing it into the "Text to Read" area, then at the top right of the area, select the category you wish to add it to, and hit save.
Practical Applications
- Education: Teachers can use Text-to-mic to provide clear, consistent instruction in virtual classrooms.
- Business Meetings: Professionals who require voice rest can use this tool to communicate effectively in meetings without straining their voices.
- Accessibility: Helps those with speech impairments communicate clearly and effectively in online meetings.
- Translation: Translate your voice to another language and then immediately play as AI generated voice to a virtual mic feed
- Expand paraphrasing: Talk or type in shorthand and have AI automatically convert it to longer form, and then speak that longer form version.
We created Text-to-Mic originally because a member of our team lost their voice, and we needed a simple solution to allow them to use text-to-speech (TTS) to speak with colleagues naturally, as this is much more engaging than typing in a parallel chat channel, which can often be overlooked.
If you enjoy using Text to Mic, you might also appreciate partnering with Scorchsoft on other technology projects. We specialise in developing technically complex web and mobile.
Frequently Asked Questions
How can I find or set up my OpenAI API Key?
You must sign up for an account and create a key in their developer's area. It sounds complex, but it's fairly straightforward; Here is a tutorial video.
What is the difference between the GPT models in AI manipulation settings?
This setting determines which AI 'model' is used to manipulate input or recorded text based on the provided prompt. Think of it as picking which AI brain to use.
- GPT 4o Mini is cheaper per word to manipulate text and is faster but less intelligent than GPT4.
- GPT 4o is a more powerful AI and is more likely to be able to deal with complex instructions, but it costs more per word to run and is a littler slower.
We recommend trying 4o-mini first due to its speed benefits and switching to GPT4 should you find you want it to perform certain AI manipulations better.
What is the "Prompt" in the AI manipulation settings?
The prompt is the set of instructions you want the AI to use when manipulating your input or output text. The AI reads the instructions you've set in the prompt, and applies them to any converted text. Here are some example promps:
- "Convert from English to Spanish"
- "Expand paraphrased utterances to fully formed sentences."
- "If I ask a question, reply to that question followed with a potential answer."
- "Edit my input. You are a clown at an amusement park; convert to speak as this persona."
- "Edit my input. You are a character in a computer game with a dark sense of humour. Convert text to speak as this persona. Remain concise"
- "Copy edit my input. My mood today: upbeat, focused. Match this tone".
We recommend trying different prompts and making up your own too. You can also write much longer prompts than the above examples should you want it to do something very specific. Remember to switch from GPT 3 to GPT 4 if your prompt is particularly complex or requires more accuracy. If the response doesn't manipulate what you've said, and replies to it, then add something like "Copy edit my input" or "Transform my input" to the prompt and this should fix that.
Remember AI can "hallucinate" false information and give wrong answers, so make sure to evaluate responses before considering them to be true.
I have ideas for new features or custom extensions that would benefit my business. Can you help me with that?
If you notice a bug or small quality-of-life enhancement, please let us know, and we will consider implementing it in the tool for free.
We can also accommodate more substantial enhancements, such as custom extensions for business; Though please be aware these are likely to carry a development charge. Please contact us to let us know what you have in mind.
Changelog
- v1.2.0 - Added presets (stored text to re-play), plus quality of life improvements.
- v1.0.8 - Added settings to remap hotkeys, changed .env file location to /config
- v1.0.7 - Added support for hotkeys (ctrl+shift+0; ctrl+shift+9; ctrl+shift+8)
- v1.0.6 - Fix audio channel sample rate mismatch issues
- v1.0.5 - Adds ChatGPT manipulations functionality to auto-manipulate input text
- v1.0.4 - Adds input device selection option
- v1.0.3 - Fixes the record button and styles better
- v1.0.2 - Added mac support, plus record voice button (But the app crashes if audio over around 3-seconds)
- v1.0.1 - First working version of the app
Terms of Use, Disclaimer, and Licence Information
Text to Mic is provided "as is" and on an "as available" basis, without any warranties of any kind, either express or implied. Scorchsoft Ltd expressly disclaims all warranties, whether express, implied, statutory, or otherwise, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. We do not warrant that the software will function uninterrupted, that it is error-free, or that any errors or defects will be corrected.
Limitation of Liability
In no event will Scorchsoft Ltd be liable for any indirect, incidental, special, consequential, or punitive damages resulting from or related to your use or inability to use Text to Mic, including but not limited to damages for loss of profits, goodwill, use, data, or other intangible losses, even if Scorchsoft Ltd has been advised of the possibility of such damages.
Use at Your Own Risk
By using Text to Mic, you acknowledge and agree that you assume full responsibility for your use of the software, and that any information you send or receive during your use of the software may not be secure and may be intercepted or later acquired by unauthorized parties. Use of Text to Mic is at your sole risk.
License Agreement