Text-to-Mic is an open-source, free text-to-speech and speech-to-text-to-speech (TTS and STTTS) to-microphone tool that turns typed text into speech audio with AI and then plays that audio to your speakers, headset, or microphone feed.

Here is a video example of how it looks when running on Windows: 

This is perfect to enable you to speak in online video meetings using text-to-speech AI. It can also manipulate text with AI in real-time which has lots of practical uses, such as tidying up speech or live translation. (See download links below).

Text-to-Mic uses the OpenAI text-to-speech engine, which surpasses the standard text-to-speech tools available on Windows and Mac. This app is available to use for free.

  • Seamless Text-to-Speech-to-Microphone (or speakers) Conversion:
    Utilizes OpenAI's API to convert text into natural-sounding speech in real-time.
  • Multiple Voices:
    Choose from a variety of OpenAI voices to find the tone that best suits your presentation or meeting style. Supported voices: Alloy, Echo, Fable, Onyx, Nova, Shimmer (Listen to samples).
  • Dual Output Capability:
    Outputs audio simultaneously to both headphones and a virtual microphone, ensuring you can monitor and share your presentation effectively.
  • STTTS - Speech-to-text-to-speech capabilities.
    Record your voice, even if you are struggling to speak, which saves as text, which you can then immediately playback over the selected audio feeds.
  • Hotkeys for Quick Access
    Trigger speech recording, conversion and playback using hotkeys (like ctrl+shift+0) to make using Text-to-mic feel more natural, quick and seamless.
  • Automatic ChatGPT AI text Manipulation
    This allows you to automatically translate what you've typed or recorded into another language, or automatically manipulate the input text in some desired way, speeding up the communications process

Watch the video above to see the power of the AI-enabled Text-to-Mic in action!

If you like this tool, we also have a free speech-to-copy-edited-text desktop app which you might be interested in, which runs in the background and allows for rapid conversion of spoken word to AI transcribed and copy edited text, pasted directly into your active application.

Download

For Windows

For Mac

Text to Mic is Open Source! View the source code on GitHub.

You will need to download, extract, and then run the .app file

Getting Started

  1. Install VB-Cable
    Install VB-Cable from https://vb-audio.com/Cable/ if you haven't already. This tool creates a virtual microphone on your Windows computer or Mac. Once installed, you can trigger audio to play through this virtual cable.
  2. Add an OpenAI API Key
    Open the Text-to-Mic app by Scorchsoft and input your OpenAPI key (Tutorial video on setting up an API Key).
    If you don't yet have an API key, visit platform.openai.com, sign up for a free account, set up billing and add some credit, generate an API Key, and copy that key into text-to-mic.
    (It's not that expensive but OpenAI will bill you for text-to-speech generation - see pricing, see the text-to-speech and speech-to-text pricing, as well as GPT models if you enable AI manipulation)
  3. Set voice
    Select your preferred voice for speech synthesis in the app UI.
  4. Choose playback devices
    Choose a playback device. I recommend selecting your headphones as one device and the virtual microphone (usually labelled "Cable Input (VB-Audio)") as the other.
  5. Set Microphone to Cable Input VB-Audio in an online meeting
    When you join a meeting on platforms like Teams, Zoom, or Google Meets, select the Cable Input audio channel in the meeting tool's settings. This will play back any audio submitted via the tool when you hit play. However, please be aware that your own microphone will not function simultaneously. You will need to switch back if you need to speak.

    Example of virtual microphone selection in Google Meet:
    example virtual mic selection in google meet

  6. Type
    Enter the text you want to convert to speech in the provided text area.
  7. Play
    Click 'Play Audio' to listen to the spoken version of your text. This replays the previously generated audio clip to prevent unnecessary use of your OpenAI API Key.
  8. Repeat what you said last
    Use the 'Play Last Audio' button to replay the last generated speech output.
  9. Housekeeping
    You can change the API key at any time under the 'Settings' menu.
  10. Experiment with AI manipulation
    Play with the settings in "Settings > ChatGPT Manipulation" to automatically use AI to translate, change, or enhance recorded or spoken words. Useful for expanding on paraphrased content to increase the speed you can communicate, or reduce vocal strain.

Advanced Usage

1. ChatGPT AI Manipulation

If you go to "Settings > ChatGPT Manipulation" then you can turn this on and pick which model to use.

ChatGPT Manipulation

If enabled (both enabled and "auto apply to recorded transcript"), this will run your transcript through AI with the desired prompt each time you record your voice and convert it to text.

If you've enabled but not turned on auto apply, then you can manually trigger this action to any text you've input into "text to Read" via the context menu "Input > Apply AI manipulation to text input". This will only work if you've turned it on and added your API key

2. Hotkeys

You can use a hotkey combination to trigger recording and playing of recorded text quickly. By default, the hotkeys are "ctrl+shift+0" to start the recording, then press it again to stop, transcribe, and submit. "Ctrl+shift+9" stops the recording without playing it. "Ctrl+shift+8" replays the last transcribed or written text.

"Settings > Hotkey Settings" allows you to customise the hotkey combinations used to trigger the above actions.

3. Presets

Click the presets button at the bottom of the app to open the presets area. You can then click a preset to automatically add it to the "Text to Read" section or double-click it to immediately play it back.

example presets area

Once loaded for the first time, presets are stored in "config/presets.json." This means that if you close the app, you can edit them and add categories, etc., via Notepad. If you do this, please make sure you don't break or invalidate the JSON structure.

You can also edit presets from within the app, but this is limited to saving new presets to an existing category, favouriting presets, and deleting them. Any other edits must be completed by editing the JSON file.

You can add a new preset by writing it into the "Text to Read" area, then at the top right of the area, select the category you wish to add it to, and hit save.

Practical Applications

  • Education: Teachers can use Text-to-mic to provide clear, consistent instruction in virtual classrooms.
  • Business Meetings: Professionals who require voice rest can use this tool to communicate effectively in meetings without straining their voices.
  • Accessibility: Helps those with speech impairments communicate clearly and effectively in online meetings.
  • Translation: Translate your voice to another language and then immediately play as AI generated voice to a virtual mic feed
  • Expand paraphrasing: Talk or type in shorthand and have AI automatically convert it to longer form, and then speak that longer form version.

 v1.0.5 screenshot of text to mic app

 v1.0.5 screenshot of chatgpt manipulation settings

We created Text-to-Mic originally because a member of our team lost their voice, and we needed a simple solution to allow them to use text-to-speech (TTS) to speak with colleagues naturally, as this is much more engaging than typing in a parallel chat channel, which can often be overlooked.

If you enjoy using Text to Mic, you might also appreciate partnering with Scorchsoft on other technology projects. We specialise in developing technically complex web and mobile.

Frequently Asked Questions

How can I find or set up my OpenAI API Key?

You must sign up for an account and create a key in their developer's area. It sounds complex, but it's fairly straightforward; Here is a tutorial video.

What is the difference between the GPT models in AI manipulation settings?

This setting determines which AI 'model' is used to manipulate input or recorded text based on the provided prompt. Think of it as picking which AI brain to use.

  • GPT 4o Mini is cheaper per word to manipulate text and is faster but less intelligent than GPT4.
  • GPT 4o is a more powerful AI and is more likely to be able to deal with complex instructions, but it costs more per word to run and is a littler slower.

We recommend trying 4o-mini first due to its speed benefits and switching to GPT4 should you find you want it to perform certain AI manipulations better.

What is the "Prompt" in the AI manipulation settings?

The prompt is the set of instructions you want the AI to use when manipulating your input or output text. The AI reads the instructions you've set in the prompt, and applies them to any converted text. Here are some example promps:

  • "Convert from English to Spanish"
  • "Expand paraphrased utterances to fully formed sentences."
  • "If I ask a question, reply to that question followed with a potential answer."
  • "Edit my input. You are a clown at an amusement park; convert to speak as this persona."
  • "Edit my input. You are a character in a computer game with a dark sense of humour. Convert text to speak as this persona. Remain concise"
  • "Copy edit my input. My mood today: upbeat, focused. Match this tone".

We recommend trying different prompts and making up your own too. You can also write much longer prompts than the above examples should you want it to do something very specific. Remember to switch from GPT 3 to GPT 4 if your prompt is particularly complex or requires more accuracy. If the response doesn't manipulate what you've said, and replies to it, then add something like "Copy edit my input" or "Transform my input" to the prompt and this should fix that.

Remember AI can "hallucinate" false information and give wrong answers, so make sure to evaluate responses before considering them to be true.

I have ideas for new features or custom extensions that would benefit my business. Can you help me with that?

If you notice a bug or small quality-of-life enhancement, please let us know, and we will consider implementing it in the tool for free.

We can also accommodate more substantial enhancements, such as custom extensions for business; Though please be aware these are likely to carry a development charge. Please contact us to let us know what you have in mind.

Changelog

  • v1.2.0 - Added presets (stored text to re-play), plus quality of life improvements.
  • v1.0.8 - Added settings to remap hotkeys, changed .env file location to /config
  • v1.0.7 - Added support for hotkeys (ctrl+shift+0; ctrl+shift+9; ctrl+shift+8)
  • v1.0.6 - Fix audio channel sample rate mismatch issues
  • v1.0.5 - Adds ChatGPT manipulations functionality to auto-manipulate input text
  • v1.0.4 - Adds input device selection option
  • v1.0.3 - Fixes the record button and styles better
  • v1.0.2 - Added mac support, plus record voice button (But the app crashes if audio over around 3-seconds)
  • v1.0.1 - First working version of the app

Terms of Use, Disclaimer, and Licence Information

Text to Mic is provided "as is" and on an "as available" basis, without any warranties of any kind, either express or implied. Scorchsoft Ltd expressly disclaims all warranties, whether express, implied, statutory, or otherwise, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. We do not warrant that the software will function uninterrupted, that it is error-free, or that any errors or defects will be corrected.

Limitation of Liability

In no event will Scorchsoft Ltd be liable for any indirect, incidental, special, consequential, or punitive damages resulting from or related to your use or inability to use Text to Mic, including but not limited to damages for loss of profits, goodwill, use, data, or other intangible losses, even if Scorchsoft Ltd has been advised of the possibility of such damages.

Use at Your Own Risk

By using Text to Mic, you acknowledge and agree that you assume full responsibility for your use of the software, and that any information you send or receive during your use of the software may not be secure and may be intercepted or later acquired by unauthorized parties. Use of Text to Mic is at your sole risk.

License Agreement

Scorchsoft Text to Mic
Copyright (C) 2024 Scorchsoft Ltd.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.

The names "Scorchsoft" and "Scorchsoft Ltd." and the associated logos are trademarks of Scorchsoft Ltd.
You may use these names solely for the purpose of providing attribution, as required by the LGPL licence,
and not in any way that implies an endorsement or affiliation with Scorchsoft Ltd. without explicit written permission.

DISCLAIMER: This software is provided "as-is," and any use of this software is at your own risk. For more information, see the LICENSE.md file included with this project.
 
Please read the full licence agreement and terms of use here before downloading or using Text To Mic (Additional terms apply as described in the LICENSE.md file).