In recent weeks, ChatGPT has transformed the world of the Internet. Every day, millions of people around the world try to use this service for their own purposes, and the web space is filled with articles that teach creative and clever uses of this service. But the introduction of the ChatGPT service by OpenAI had another side effect; More attention to the field of artificial intelligence. The same thing made us wonder, after messing around with ChatGPT, whether it is possible to combine this service with other artificial intelligence services to perform a specific task.
After a little searching and familiarization with different services, we reached the Microsoft Speech Studio service. This service allows you to train artificial intelligence to imitate your voice and use it for various tasks such as converting text to voice or voice chat with customers.
Therefore, we decided to teach you how to register and use Speech Studio in today’s post so that you can use it to create an audio service with the voice of your choice.
Basic Preparations for Voice Using Artificial Intelligence
The Speech Studio service, like other Microsoft cloud services and artificial intelligence, is offered through Azure. Unfortunately, currently, Azure services (even free services) are not available for Iranian users. On the other hand, to create an account, you must have an international credit card, which is not possible for everyone. For this reason, we used intermediary services for registration. With a simple search on the Internet, you can find many intermediaries that will create an Azure account for you.
At the beginning of registration, Microsoft gives you a $200 gift to use its services. For educational purposes and personal use, this $200 is enough, but if you need more credit, intermediary companies can top up your account.
After you choose your broker and order an Azure account, you will receive an email with your account information. Note that to use this service, you should not log in with an Iranian IP. Although there is hope for Iranian users with the approval of new laws aimed at lifting the sanctions on cloud services and Kurso messaging until then, use an IP outside of Iran to ensure this.
Sign in to your Speech Studio account
After receiving the user account information, you can go to the following address and enter your account.
After logging into the account, you will face the following environment:
On this page, you can read the necessary information to use the service and hear the sample sounds produced by Speech Studio. Currently, Microsoft offers two plans for artificial intelligence training:
- Lite plan
- Pro plan
Differences between pro and lite plans
The Lite plan allows you to give up to 50 samples to artificial intelligence. In the Pro plan, it is possible to provide up to 2000 sound samples. The following table briefly shows the differences between these two plans:
|Lite plan||Pro plan|
|Accepts between 20 and 50 samples||Accepts between 300 and 2000 samples|
|It takes less than an hour to train||It takes between 20 and 40 hours to train|
|It is not possible to specify different sound models||It is possible to specify different speech models (for example, sad, happy, etc.)|
|It supports 13 languages||Support for 50 languages|
Another important point is that in order to use the Pro plan, you have to fill out the contract of ethical use of Microsoft’s artificial intelligence from the beginning. In this contract, you are asked for personal information, including your work email, address, etc., and at the end, you must pledge to use this service for work and personal purposes that do not conflict with public security.
The Pro plan is suitable for large companies and organizations or professional content producers, and you should use high-quality studio microphones to record sample sounds.
But in the Lite plan, you don’t need to fill out a contract to train artificial intelligence (although you still need to sign it to deploy and use the service in your own business) and you can record your voice with any type of microphone.
Click on Speech Studio at the top of the page and on the next page, scroll to the bottom of the page and select Try Lite project.
After clicking the Try Lite project button, you will be redirected to the project creation page. Click on the project creation option at the top of the page to open the project window for you.
In this window, select your account and then select the project resource. If you do not already have a resource, you will be prompted to create an audio services resource. After creating the resource, your project is created and you can import samples.
To enter the samples, you have to read a series of predetermined sentences using the microphone. After reading each sentence and if you confirm it (the tick next to the audio file turns green), you can go to the next text.
Once the samples are ready, you can move on to the next step.
In the Train AI tab, click on the training option. A message indicating the start of the training and the estimated time will be displayed. As we said, this time is under an hour.
After completing the training, the result page will be displayed to you. On this page, there is some text along with the button to play the audio file. When you click on the buttons, the texts will be read with your own voice imitated by artificial intelligence. The sound and accent quality in the Lite plan is very good and is sufficient for most purposes. In the Lite plan, a fee of $0.8 is considered for each hour of sound produced. However, the implementation costs are calculated separately and are deducted from your account based on the amount of usage.
Why this service?
This service, individually or together with other services, can surprisingly display the undiscovered potential of businesses. For example, with the help of a personalized voice service next to a chatbot, you can convert the texts published by the chatbot into audio and interact with users in this way. Also, with the help of this service, content producers can publish their content in different languages and with their own voices, thus expanding the range of their audience. What other uses do you think of for this service?