Text-to-Mic: Free AI Text-to-Speech-to-Microphone Tool (TTS & STTTS App for Windows and Mac)

By Andrew Ward May 01, 2024

Text-to-Mic is an open-source, free text-to-speech and speech-to-text-to-speech (TTS and STTTS) to-microphone tool that turns typed text into speech audio with AI and then plays that audio to your speakers, headset, or microphone feed.

Here is a video example of how it looks when running on Windows:

(Or watch on YouTube)

This is perfect to enable you to speak in online video meetings using text-to-speech AI. It can also manipulate text with AI in real-time which has lots of practical uses, such as tidying up speech or live translation. (See download links below).

Text-to-Mic uses the OpenAI text-to-speech engine, which surpasses the standard text-to-speech tools available on Windows and Mac. This app is available to use for free.

Seamless Text-to-Speech-to-Microphone (or speakers) Conversion:
Utilizes OpenAI's API to convert text into natural-sounding speech in real-time.
Multiple Voices:
Choose from a variety of OpenAI voices to find the tone that best suits your presentation or meeting style. Supported voices: Alloy, Echo, Fable, Onyx, Nova, Shimmer (Listen to samples).
Customisable Tones:
Take control of not just the accent of the voice, but the tone and the way that it speaks too. Text to Mic comes pre-loaded with some tones that you might enjoy, plus includes the ability for you to add your own.
Dual Output Capability:
Outputs audio simultaneously to both headphones and a virtual microphone, ensuring you can monitor and share your presentation effectively.
STTTS - Speech-to-text-to-speech capabilities.
Record your voice, even if you are struggling to speak, which saves as text, which you can then immediately playback over the selected audio feeds.
Hotkeys for Quick Access
Trigger speech recording, conversion and playback using hotkeys (like ctrl+shift+0) to make using Text-to-mic feel more natural, quick and seamless.
Automatic AI Copyediting
This allows you to automatically tidy up, manipulate, or translate what you've typed or recorded into another language, or automatically manipulate the input text in some desired way, speeding up the communications process

Watch the video above to see the power of the AI-enabled Text-to-Mic in action!

If you like this tool, we also have a free speech-to-copy-edited-text desktop app which you might be interested in, which runs in the background and allows for rapid conversion of spoken word to AI transcribed and copy edited text, pasted directly into your active application.

Download

Virus scanners on windows can give false positives for this app given how it uses your mic and copy and paste. If you'd like to review and compile the source code yourself then you can access it here on github.

For Windows

For Mac

Download v1.0.5 for Mac (28MB ZIP) (Latest)
Download v1.0.2 for Mac (28MB ZIP)

Text to Mic is Open Source! View the source code on GitHub.

You will need to download, extract, and then run the .app file

Getting Started

Install VB-Cable
Install VB-Cable from https://vb-audio.com/Cable/ if you haven't already. This tool creates a virtual microphone on your Windows computer or Mac. Once installed, you can trigger audio to play through this virtual cable.
Add an OpenAI API Key
Open the Text-to-Mic app by Scorchsoft and input your OpenAPI key (Tutorial video on setting up an API Key).
If you don't yet have an API key, visit platform.openai.com, sign up for a free account, set up billing and add some credit, generate an API Key, and copy that key into text-to-mic.
(It's not that expensive but OpenAI will bill you for text-to-speech generation - see pricing, see the text-to-speech and speech-to-text pricing, as well as GPT models if you enable AI manipulation)
Set voice
Select your preferred voice for speech synthesis in the app UI.
Choose playback devices
Choose a playback device. I recommend selecting your headphones as one device and the virtual microphone (usually labelled "Cable Input (VB-Audio)") as the other.
Set Microphone to Cable Input VB-Audio in an online meeting
When you join a meeting on platforms like Teams, Zoom, or Google Meets, select the Cable Input audio channel in the meeting tool's settings. This will play back any audio submitted via the tool when you hit play. However, please be aware that your own microphone will not function simultaneously. You will need to switch back if you need to speak.

Example of virtual microphone selection in Google Meet:
Type
Enter the text you want to convert to speech in the provided text area.
Play
Click 'Play Audio' to listen to the spoken version of your text. This replays the previously generated audio clip to prevent unnecessary use of your OpenAI API Key.
Repeat what you said last
Use the 'Play Last Audio' button to replay the last generated speech output.
Housekeeping
You can change the API key at any time under the 'Settings' menu.
Experiment with AI manipulation
Play with the settings in "Settings > ChatGPT Manipulation" to automatically use AI to translate, change, or enhance recorded or spoken words. Useful for expanding on paraphrased content to increase the speed you can communicate, or reduce vocal strain.

What Users Say

what niko said about our app

If you find Text-to-Mic useful, then please consider leaving a review. Reviews like this one are not only lovely to read, but really help us to build credibility and trust with our customers.

Advanced Usage

1. ChatGPT AI Manipulation

If you go to "Settings > ChatGPT Manipulation" then you can turn this on and pick which model to use.

If enabled (both enabled and "auto apply to recorded transcript"), this will run your transcript through AI with the desired prompt each time you record your voice and convert it to text.

If you've enabled but not turned on auto apply, then you can manually trigger this action to any text you've input into "text to Read" via the context menu "Input > Apply AI manipulation to text input". This will only work if you've turned it on and added your API key

2. Hotkeys

You can use a hotkey combination to trigger recording and playing of recorded text quickly. By default, the hotkeys are "ctrl+shift+0" to start the recording, then press it again to stop, transcribe, and submit. "Ctrl+shift+9" stops the recording without playing it. "Ctrl+shift+8" replays the last transcribed or written text.

"Settings > Hotkey Settings" allows you to customise the hotkey combinations used to trigger the above actions.

3. Presets

Click the presets button at the bottom of the app to open the presets area. You can then click a preset to automatically add it to the "Text to Read" section or double-click it to immediately play it back.

example presets area

Once loaded for the first time, presets are stored in "config/presets.json." This means that if you close the app, you can edit them and add categories, etc., via Notepad. If you do this, please make sure you don't break or invalidate the JSON structure.

You can also edit presets from within the app, but this is limited to saving new presets to an existing category, favouriting presets, and deleting them. Any other edits must be completed by editing the JSON file.

You can add a new preset by writing it into the "Text to Read" area, then at the top right of the area, select the category you wish to add it to, and hit save.

Practical Applications

Education: Teachers can use Text-to-mic to provide clear, consistent instruction in virtual classrooms.
Business Meetings: Professionals who require voice rest can use this tool to communicate effectively in meetings without straining their voices.
Accessibility: Helps those with speech impairments communicate clearly and effectively in online meetings.
Translation: Translate your voice to another language and then immediately play as AI generated voice to a virtual mic feed
Expand paraphrasing: Talk or type in shorthand and have AI automatically convert it to longer form, and then speak that longer form version.

We created Text-to-Mic originally because a member of our team lost their voice, and we needed a simple solution to allow them to use text-to-speech (TTS) to speak with colleagues naturally, as this is much more engaging than typing in a parallel chat channel, which can often be overlooked.

If you enjoy using Text to Mic, you might also appreciate partnering with Scorchsoft on other technology projects. We specialise in developing technically complex web and mobile.

Screenshots

Main UI:

v1.0.5 screenshot of text to mic app

Tone of voice presets manager:

presets manager

AI text manipulation settings:

v1.0.5 screenshot of chatgpt manipulation settings

Frequently Asked Questions

Do I need a ChatGPT Subscription to use this?

No, you do not need an OpenAI subscription to use this tool. However, you do need to set up an OpenAI key, which will charge you based on usage. The costs aren't too high for moderate use, but if you decide to use it, keep an eye on your charges for the first few days to ensure you're comfortable with the fees.

I don't want to sign up for a key that charges me, can I still use the app?

Yes, there is a simplified version that only converts text to speech using the system's built-in text-to-speech capabilities. While these system voices are not as good or advanced, they can serve as a useful fallback and are a cost-effective option. When you load the app for the first time, if you choose not to add your API key, the application will still open, and you will be able to use it. Here is a screenshot showing how this displays.

Please note that in the non-API key-enabled version, you will only have access to text-to-speech, not speech-to-text.

How can I find or set up my OpenAI API Key?

You must sign up for an account and create a key in their developer's area. It sounds complex, but it's fairly straightforward; Here is a tutorial video.

What is the difference between the GPT models in AI manipulation settings?

This setting determines which AI 'model' is used to manipulate input or recorded text based on the provided prompt. Think of it as picking which AI brain to use.

GPT 4o Mini is cheaper per word to manipulate text and is faster but less intelligent than GPT4.
GPT 4o is a more powerful AI and is more likely to be able to deal with complex instructions, but it costs more per word to run and is a littler slower.

We recommend trying 4o-mini first due to its speed benefits and switching to GPT4 should you find you want it to perform certain AI manipulations better.

What is the "Prompt" in the AI manipulation settings?

The prompt is the set of instructions you want the AI to use when manipulating your input or output text. The AI reads the instructions you've set in the prompt, and applies them to any converted text. Here are some example promps:

"Convert from English to Spanish"
"Expand paraphrased utterances to fully formed sentences."
"If I ask a question, reply to that question followed with a potential answer."
"Edit my input. You are a clown at an amusement park; convert to speak as this persona."
"Edit my input. You are a character in a computer game with a dark sense of humour. Convert text to speak as this persona. Remain concise"
"Copy edit my input. My mood today: upbeat, focused. Match this tone".

We recommend trying different prompts and making up your own too. You can also write much longer prompts than the above examples should you want it to do something very specific. Remember to switch from GPT 3 to GPT 4 if your prompt is particularly complex or requires more accuracy. If the response doesn't manipulate what you've said, and replies to it, then add something like "Copy edit my input" or "Transform my input" to the prompt and this should fix that.

Remember AI can "hallucinate" false information and give wrong answers, so make sure to evaluate responses before considering them to be true.

I have ideas for new features or custom extensions that would benefit my business. Can you help me with that?

If you notice a bug or small quality-of-life enhancement, please let us know, and we will consider implementing it in the tool for free.

We can also accommodate more substantial enhancements, such as custom extensions for business; Though please be aware these are likely to carry a development charge. Please contact us to let us know what you have in mind.

Changelog

v1.4.1 - The app now works without an API key but only supports system voices and text-to-speech (no speech-to-text or other AI capabilities). System voices are also available as an option, which is useful if there are internet or connectivity issues.
v1.4.0 - Lots of UI and UX improvements, added latest version checking, app now remembers previous input and output device selection.
v1.3.5 - More voices added, improve presets to scale, improve keyboard shortcuts to add cancel operation, allow banner hiding.
v1.3.0 - Ability to change tone of voice. Tone of voice preset manager. Updated UI look and feel
v1.2.0 - Added presets (stored text to re-play), plus quality of life improvements.
v1.0.8 - Added settings to remap hotkeys, changed .env file location to /config
v1.0.7 - Added support for hotkeys (ctrl+shift+0; ctrl+shift+9; ctrl+shift+8)
v1.0.6 - Fix audio channel sample rate mismatch issues
v1.0.5 - Adds ChatGPT manipulations functionality to auto-manipulate input text
v1.0.4 - Adds input device selection option
v1.0.3 - Fixes the record button and styles better
v1.0.2 - Added mac support, plus record voice button (But the app crashes if audio over around 3-seconds)
v1.0.1 - First working version of the app

Terms of Use, Disclaimer, and Licence Information

Text to Mic is provided "as is" and on an "as available" basis, without any warranties of any kind, either express or implied. Scorchsoft Ltd expressly disclaims all warranties, whether express, implied, statutory, or otherwise, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. We do not warrant that the software will function uninterrupted, that it is error-free, or that any errors or defects will be corrected.

Limitation of Liability

In no event will Scorchsoft Ltd be liable for any indirect, incidental, special, consequential, or punitive damages resulting from or related to your use or inability to use Text to Mic, including but not limited to damages for loss of profits, goodwill, use, data, or other intangible losses, even if Scorchsoft Ltd has been advised of the possibility of such damages.

Use at Your Own Risk

By using Text to Mic, you acknowledge and agree that you assume full responsibility for your use of the software, and that any information you send or receive during your use of the software may not be secure and may be intercepted or later acquired by unauthorized parties. Use of Text to Mic is at your sole risk.

License Agreement

Scorchsoft Text to Mic

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.

The names "Scorchsoft" and "Scorchsoft Ltd." and the associated logos are trademarks of Scorchsoft Ltd.

You may use these names solely for the purpose of providing attribution, as required by the LGPL licence,

and not in any way that implies an endorsement or affiliation with Scorchsoft Ltd. without explicit written permission.

DISCLAIMER: This software is provided "as-is," and any use of this software is at your own risk. For more information, see the LICENSE.md file included with this project.

Please read the full licence agreement and terms of use here before downloading or using Text To Mic (Additional terms apply as described in the LICENSE.md file).

Need help building your tech ideas?

Scorchsoft are expert app and portal developers in the UK. 15 years experience.

Learn More Contact us

Need help building your tech ideas?

Scorchoft are expert app and portal developers in the UK.
Over a decade of experience.

Learn more Contact us

We Make
Mobile Apps, Portals, SaaS, & Progressive Web Apps

All Case Studies

Discover How Scorchsoft Can Help

We would love to hear about your project. Please contact us, and share your goals; we'll respond with our thoughts and a rough cost estimate.

Scorchsoft is a UK-based team of web and mobile app developers and designers. We operate in-house from Birmingham, and our offices are located in the heart of the Jewellery Quarter.

About Scorchsoft Contact Us

We can deliver your innovative, technically complex project, using the latest web and mobile application development technologies.

Scorchsoft develops online portals, applications, web apps, and mobile app projects. With over fifteen years experience working with hundreds of small, medium, and large enterprises, in a diverse range of sectors, we'd love to discover how we can apply our expertise to your project.

Our Capabilities Our Work Get a Free Quote