Exploratory Data Visualization

I tried to find an interesting way to show the data from a dataset of popular baby names from 2001 to 2009. Recently, I spent some time learning about the importance of data visualization and key features that make it most accessible. A graph or other visual representation of data can be a good way to communicate the themes and important details of a dataset. Often, these visual representations are easier to take in and can spread information easier than having the average person try and sort through a dense set of numbers or data.

I worked with the website RAWGraph to make a line graph representing the data. I made my x-axis the year and the y-axis the “count”, or number of babies with a given name in that year. I have attached an image of that below:

Line graph showing most popular baby names from 2001 to 2009
Popular Baby Names 2001-2009

I tried to mess around with various features of the graph in order to make it easier to look at and understand. One thing I tried to change was the height and width of the overall graph. I changed it so that both dimensions were larger (around 1000 each). This was one of the ways I tried to counteract the messiness of the graph. I felt like it added more space, which made it easier to distinguish lines from each other. Another change I made was to make all the lines different colors using the color wheel option. This was kind of time consuming because of all the names, and only half of them were automatically assigned a color. I did this to make the lines stand apart more clearly. A third change I made was the diameter of the dots that represent individual data points. I chose to make them bigger to emphasize the vertical comparison between each name in a given year. I felt like this added a separate dimension to the visualization so that it was not only a graph showing each individual name and its trend over the years, but also which names had the highest popularity per year.

I made those three changes to make the data clear and understandable. While there are some ways in which it is a good visualization, there are some flaws I wanted to note. One thing I want to work on in future projects is figuring out how to isolate the data to just a few names so that the graph is not as crowded. In my digital humanities class we had a guest speaker who explained the importance of clear and directed visualizations. She gave us an example of a graph that had the most important trend line bolded to make it really pop. That was something I felt was lost in my graph due to the high volume of names. I was looking for a way to just look at 3-5 names at a time, but did not find a way to. I also think another side effect of the crowded names is the labels of names could be confusing and accidentally appear to describe one line instead of another. The colors can also be similar just because there were only so many colors I could assign.

Overall, it was interesting for me to learn how to create a data visualization like this one using a new tool. I am looking forward to learning more about this through other people’s exploration of similar tools!

2 thoughts on “Exploratory Data Visualization

  1. This was a nice way to visualize the popularity of each name. I think it was a good idea to change the height and width to make the graph easier to understand. I also think changing the color of each name also made it a lot easier to understand as well.

  2. I think the line graph is an appropriate way to present the data set and I really like that you changed the diameters of the dots so that readers can make the vertical comparison easily. However, I think it would be better if you make the curve type to be linear. Since the collected data don’t indicate the change in counts throughout a specific year, the curvy lines might falsely imply the changes.

Leave a Reply to wangm5 Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php