The Google keynote titled "General-Purpose Sound Event Recognition" (scheduled on Day 1 of the Workshop), will be presented by Shawn Hersey from Google Research. Due to unforeseen circumstances, the original presenter, Dan Ellis, is not available for the keynote. We apologize for the inconvenience.
Inspired by the success of general-purpose object recognition in images, we have been working on automatic, real-time systems for recognizing sound events regardless of domain. Our goal is a system that can tag or describe an arbitrary soundtrack - as might be found on a media sharing site like YouTube - using terms that make sense to a human. I will cover the process of defining this task, our deep learning approach, our efforts to collect training data, and our current results. I'll discuss some factors important for accurate models, and some ideas about how to get the best return from manual labeling investment.
Shawn Hershey is a software engineer at Google Research, working in the Machine Hearing Group on machine learning for speech and audio processing. He is currently working on soundtrack classification and audio event detection. Before Google he worked as the first Software Engineer at Lyric Semiconductors, building tools to aid the development of hardware accelerators for AI. On the side, Shawn travels the world teaching Lindy Hop and blues dancing and playing in swing and blues bands. Long ago Shawn graduated from the University of Rochester with a BA in Computer Science and half of a degree from the Eastman School of Music.