As humans, we're pretty good at tuning our hearing in to certain targets, like a specific person in a crowded room. But computers can take this much further.
So far, in fact, that a new piece of artificial intelligence has been taught to watch people playing music, then completely separate the sound of their instruments on demand, letting you hear one and not the other.
Developed by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology, the software was trained by being shown over 60 hours of footage of musical performances. Called 'PixelPlayer', the AI understands where in the frame the instrument is, what is sounds like, and how to separate that sound.
Sit two people next to each other, one playing guitar and the other playing flute, shoot a video, show it to the AI, and the sound of each instrument can be separated and played independently just by clicking on it.
What makes this AI different to previous attempts to separate sound, is that it uses video as well as audio to identify the instrument's sound and isolate it. Beyond just listening, the AI analyzes each pixel of the video and creates a set of components which represent the sound from each pixel. The AI surprised researchers with how well it could achieve this.
Hang Zhao, lead author for the project, told MIT News: "We expected a best-case scenario where we could recognize which instruments make which kinds of sounds. We were surprised that we could actually spatially locate the instruments at the pixel level. Being able to do that opens up a lot of possibilities, like being able to edit the audio of individual instruments by a single click on the video."
Interestingly, because of how the AI uses 'self-supervised' deep learning to understand what it is seeing and hearing, the researchers admit they don't explicitly understand every aspect of how it learns which instrument makes which sound.
Researchers behind the AI say its ability to change the volume of individual instruments means it could, in the future, be used to help engineers improve the sound quality of old concert footage.
The AI has been taught to correctly identify the sounds of more than 20 commonly played instruments, and this number could increase with more training, researchers say. However, in its current state of intelligence it would struggle to tell the difference between two instruments from the same class - such as two types of similar-sounding guitar.