Speaking in song

New singalong software brings sweet melody to any cacophonous cry.

3652_0.jpg

“We want to use our technology to help the average person sing well," the researcher says.

Sergey Nivens

Whether you give it your best effort or your worst, voice synthesis software developed at Singapore’s Agency for Science, Technology and Research (A*STAR) will make you sound like the melodious singer you’ve always wanted to be. Called I2R Speech2Singing, this software is the first to deliver high-quality singing automatically, while still preserving the original character of your natural voice.

“Many people like singing but they lack the skills to do so,” says Minghui Dong, the project leader at A*STAR’s Institute for Infocomm Research (I2R). “We want to use our technology to help the average person sing well.”

Speech consists of three key elements: content, prosody and timbre. Content is conveyed using words; prosody, or melody in the case of singing, is expressed through rhythm and pitch; and timbre is the distinctive quality that makes a banjo sound different from a trumpet and one singer’s voice different from another’s. I2R Speech2Singing works by polishing melody while retaining the original content and timbre of a sound.

Existing technologies that focus on correcting melody try to align off-tune sounds to the closest note on the musical scale or to the exact note in the original score. The former works well for professional singers who may be only slightly out of tune but cannot fix those who are singing drastically off-key or simply reading out loud. The latter is better at correcting discordant tunes but ignores many other aspects of melody such as vibrato and vowel stretching.

I2R Speech2Singing uses recordings by professional singers as templates to correct the melody of a singing voice or to convert a speaking voice into a singing one. The software detects the timing of each phonetic sound using speech recognition technology and then stretches or compresses the duration of the signal using voice conversion technology to match the rhythm to that of a professional singer. A speech synthesizer then combines the time-corrected voice with pitch data and background music to produce a beautiful solo.

“When we compared the output with other currently available applications, we realized that our software generated a much better voice quality,” says Dr Dong.

Singaporeans were first introduced to the software in 2013 through “Sing for Singapore”, part of the official mobile app of National Day Parade 2013. And in 2014, I2R Speech2Singing won the award for best Show & Tell contribution at INTERSPEECH, a major global venue for research on the science and technology of speech communication.

Dr Dong and his team are now developing a solution to quickly add songs into the software so that large-scale song databases can be easily built.

For further information contact:

Dr Minghui Dong
Lab Head, Voice Analysis and Synthesis Lab
Institute for Infocomm Research
Agency for Science, Technology and Research, Singapore
E-mail: [email protected]