I'm skeptical. Without telling a computer what jazz is, and programing it ahead of time, there is no way a computer will know what jazz is. In fact, most people who think they play jazz don't know what jazz is.
From what I understand. It gets "jazz' (in your example) input from the text description. Then it matches 'audio sets' to the text.
"AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. been pre-described by musicians."
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.