Visualizing a Modern History of Music Using Wikipedia Data

Music & Dance


Pulp Fiction Chubby Checkers Dance Scene“I do believe Marsellus Wallace, my husband – your boss –  told you to take me out and do whatever I want. Now I wanna dance, I wanna win, I want that trophy, so dance good.”

Pulp Fiction
is my favourite movie of all time. Amazing story, cast, performance and director makes it my go to movie for the “have you seen?” ice breaker. Possibly the most iconic scene from the movie is that with Uma Thurman and John Travolta twisting it up on the Jack Rabbit Slim’s floor to Chubby Checker. While this may be Uma Thurman’s most iconic dance scene John Travolta seems to be an unstoppable, slicked back, jet black hair,  force of nature when it comes to dancing through the musical ages: Michael, Hairspray, Pulp Fiction, Grease, Saturday Night Fever, and Urban Cowboy all feature Travolta lighting up (sometimes literally) the dance floor. While a lot of people may get a chuckle out of some of his dance floor disco music scenes, we can ask ourselves if all “cool” dance movies are destined for the same thing. Will we be cringing at Bring it On and Stomp the Yard as time goes on? Probably not as quickly as with disco, if we take a look at the music these scenes are dancing too.

The Viz

Click the plot for the full-res version

Music Genre Popularity Over the Years

By analysing a list of Billboard’s top year end 100 songs we can quantify modern movements of music of  by the share of music genres represented for each years most popular songs according to Billboard. We can see Disco’s short dozen year glory and the more sustained emergence of rap/hip-hop since the 90’s. Interestingly the movement of disco corresponds to some of the lowest amount of genre shares from pop music, with history repeating again later in the mid 2000’s, when Rap/HipHop was at it’s peak. Soft and Hard Rock faired even worse than disco with them completely dying out by 2005 as opposed to disco’s somewhat small influence that continues to limp on with Madonna’s revival in 2005 and Bruno Mars and Maroon 5 getting songs stuck in our heads post-2010.

How was this Plot Made?

Wikipedia is a gold mine of listslists of lists and even lists of lists of lists. One of these lists of lists happens to be Billboard’s Hot 100 songs which allows us to browse Wikipedia’s data pretty easily. Even easier after a quick look at URL’s we can simply generate each page we want to scrape data from. We begin by loading the necessary modules and parameters for our program.

Followed by a script to exploit the patterns we find in the URL’s and try to extract all possible links from the list pages

With all the links stored in a dataframe we can load each of the wikipedia pages to extract information from the info table at the top right corner of each page. Unfortunately this can get especially brutal when all these info tables are variable lengths, have different HTML nesting and incomplete data (how have people not categorized The Gorrilaz’s music?!).

Music box differences example

To deal with this we find the table object in our code and save it as a string to be loaded later for analysis. The advantage of this is two fold – it allows us to gather all necessary information from one run and also gives us the power of simply looking for key-words in music genres to help sort through the zoo of user-defined definitions.

Now with all the HTML strings we can begin analysing it. I extract a list of key words when a music genre can be found and put them into a column comprised of what I ended up calling ‘dirty lists’ – lists filled with:  typos, non-uniform proper nouns, references, etc etc.

Finally we create flag columns of each music genre a song contains to make plotting songs easier.

and finally printing them out after trying to tweak some parameters

With a final output of roughSeabornPlot

The raw data is exported to LaTeX to create a slightly more polished look.

What Did You Think?

Enjoy the plot? Inspired to do your own analysis? Have a question about the code? Sad I didn’t use your favourite song as the musical box info comparison picture?  Please tweet at me on my twitter like the Facebook page or leave a comment below. Receiving Feedback is one of the most rewarding part of create visualizations and I would love if you’d open a dialogue or share!

2 thoughts on “Visualizing a Modern History of Music Using Wikipedia Data

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.