Visualizing Audio Pipelines with Streamlit

When working with image data, practitioners often use augmentations. Augmentations are techniques that artificially and randomly alter the data to increase diversity. Applying such transformations to the training data makes the model more robust.

For image data, frequently used candidates are rotating, resizing, or blurring. The effects of the transformations are easy to see and comprehend. Even multiple augmentations can be grasped quickly, as the feature image shows.

However, such augmentations are not restricted to images (though they are pretty popular there). For audio data, there are similar ways to modify the training data. The downside is that one cannot observe augmentations’ effects simultaneously but must listen to the individual transformations manually.

However, instead of listening to all samples sequentially, you can visualize the audio data. By visualizing the alterations, you shift the examination to the image world. Now, a glance can tell you more than listening to ten samples simultaneously. This is helpful when you are prototyping for a new task and want to quickly check how your dataset changes.

To create a simple application that does what you are looking for, you thankfully no longer have to dive deep into GUI programming. Instead, you can utilize tools like streamlit to visualize data scripts. Combining this with the audiomentations package, you can build a visualizer in two, three afternoons.

Let’s see how such an application can look:

Overview of the exemplary application. You can find the code here and try it live here.

On the lefthand sidebar, you build your augmentation pipeline. Select some transformations; perhaps consisting of adding noise (AddGaussianNoise) and masking frequency (FrequencyMask). Then, either upload one of your audio files or select one of the provided samples. After clicking apply, you can examine the effects in the centre.

The leftmost column contains the spectrogram representation of the file. This visualization shows the power of a frequency at a given timestep. The middle column includes the waveform, and the rightmost column shows an audio player.

The sidebar is created by calling the following code:

We first create checkboxes for all possible augmentations and visually divide this section by placing a horizontal bar below. The next part adds the file selection capabilities, and the last button starts the visualization process.

The main layout is created by this method:

We initially plot the unaltered audio file and apply the individual transformations of our pipeline afterwards. The output of augmentation A is the input to augmentation B. To distinguish between the individual changes, we add a small heading, and that’s it.

More code does the background work, but there is not much exciting stuff happening. If you want to dive into it anyway, then go through the lines here. To see the complete application in action, you have two options:

First, you can try it live in your browser here.

Secondly, you can clone the repository, install the required packages, and then run streamlit run visualize_transformation.py in your command line. Streamlit then renders and serves your python script locally on http://localhost:8501/.