Progress
At this point in the process, we have contacted the registrar’s office to collect data on the majors at Carleton over time. However, they redirected us to an online database containing this information for several schools. From this website, we gathered the data for the majors at Carleton College. After collecting this data, we cleaned it, removing excess information that is not needed. From this point forward, we can then format the data in a form that can be used by flourish. Using Flourish, we will then create a bar chart race of the top 5 most popular majors over time. One thing worth mentioning, some roles were switched with others due to personal preferences and different strengths and weaknesses.
Citations
U. S. Department of Education, (2003). The Integrated Postsecondary Education Data System. Retrieved February 25, 2023, from https://nces.ed.gov/ipeds/datacenter/InstitutionList.aspx?goToReportId=1
Problems and Solutions
What we have struggled with most is finding the specific data for what we intended to research. We were under the impression that the Registrar’s office had data about the most popular majors at Carleton over the years. And while they definitely do, this information isn’t publicly available, so we weren’t able to use it for our project. We were, instead, directed to use the Integrated Post-Secondary Education Data System. While the database is public information and therefore free for us to use, it has so much information, it was proving to be difficult to find exactly what we were looking for. We believed we would have to either shift the direction of the project in order to work with the data we are able to find, or reach back out to the Registrar’s office and see if they can provide us with more direction. We were able to find the data that we were looking for within the database. Then we were able to download and clean the data.
Tools and Techniques
We have decided to use Google Sheets to store and organize our data and an online animated chart builder called Flourish to visualize the data. Our dataset was cleaned by hand using Google Sheets and some slight modifications were made to it. The dataset depicts each major of a particular year as a row, with some majors having two rows representing students who graduated with it as their primary or secondary major. Each major has a count of men and a count of women who graduated with it, so we used that information to make a total count of graduates per major per year. The data still isn’t perfect, as the years 2001-2004 don’t have a separation between first and second majors, so that is something we will have to take into consideration when creating our racing bar chart.
Deliverables
As of now, we have sorted and cleaned our data, showcasing the total number of majors per class year from 2001 to 2021. There is also a count for each major by gender, which we are considering including as a comparison in our graphs. With the cleaned data, it shouldn’t be long before we have our graphs ready. As of now, they are still in progress and our project is still on track. What we have left is being worked on and modified to fit the needs of the project.
Personal Messages
Cuong Chi Tran
For this week, I have cooperated with Daniel to clean up the gathered data together on a Google Sheet. Later on, we will further refine the data’s organization, but for now, all the needed data has been gathered. Additionally, due to being one of the more outspoken members of the group, I have been acting as the coordinator for the group regarding what to do, such as helping to organize who is to write which portion of this week’s update.
Adam Mahabir
Throughout the project, I have helped Cuong overlook and divide the tasks for the group. I have also been in contact with the members of the groups regarding their progress in their specific tasks. The project has been going well so far, but I am more excited for when we start working on the data visualizations.
Rowen Hinrichs
As of right now, I’m still responsible for creating the visualization of our data. That much hasn’t changed, even considering some other job shuffling that has been done. Now that our data is organized and ready to work with, I’m excited to create the racing bar chart and see these data really come to life.
Kyra Landry
As we planned, I was able to contact the Registrar’s office and find out where to find data about the popular majors at Carleton. The database had much more information than I had anticipated and was quite overwhelming for me. Fortunately, Daniel was able to find the data we were looking for.
Daniel Estrada
Due to a change in circumstances, I took care of cleaning the data. Because the data was coming directly from the website, I couldn’t easily convert it to a CSV and clean it in Open Refine. Instead, I manually cleaned the data on Google Sheets by copying and pasting the information and getting rid of data we didn’t need.
Nice update! Looks like it’s going well despite some switch-ups and added hoops to jump through that required repositioning. I’m really excited to see the final project – a bar chart race of majors seems really fun! Plus, it will be interesting to see how our major lineup today differs from the past. Good luck with the next steps!
Thank you! Despite the hiccups we’re still on a good track. We’re getting closer and closer to putting it all together, but now we’re running into some other problems involving sorting the data one more time. Still, I’m looking forward to putting it all together and seeing the fruits of our labor.
This project sounds so exciting! I can definitely relate to the issue of having an overwhelming amount of data to clean and sort through because I have experienced that in other projects, so I feel your pain. But, it sounds like you all managed to refine your data and are on your way to making your data visualization so that’s awesome! I’m looking forward to seeing the final product.
Manually cleaning the data using Excel must’ve been pretty tedious and annoying at times, especially given the large data set, so kudos to you guys. I created a game in my A&I that intended to help students choose a major by moving around the board, collecting points, and looking at the different classes in each major, so it’s cool to see you guys doing a project related to majors! The racing bar chart idea for the visualization also sounds interesting, can’t wait to see the final results!
Manually cleaning the data can be done easily done if you find shortcuts that can be applied to all data points. However, since the data set that we worked with only had data from 2001 to 2021, I think manually cleaning this might not be the best option for larger data sets.
Seems like you are making fine progress here! I am excited to see your final bar race, which would be so interesting. By the way, would you care to share this dataset with us? We are mapping graduate schools attended by Carleton graduates, but our own data is kind of incomplete.
Manually cleaning the data can be done easily done if you find shortcuts that can be applied to all data points. However, since the data set that we worked with only had data from 2001 to 2021, I think manually cleaning this might not be the best option for larger data sets.
This project will be cool to see, especially since it will require a lot of data wrangling to fit the needed format in Flourish. What made you all decide to use Google Sheets? What were the features most helpful on Google Sheets?