Carletonian Corpus Final Project Update

What have you done so far, who have you talked to, what have you gathered, and what have you built?
- We have downloaded a sampling of front cover pages of the Carletonian from 1883. We’ve reached out to reference librarian Sarah Calhoun and are hoping to use her expertise to find a streamlined way to extract the text from the PDFs for use in a tool like Voyant.

What issues have you run into?
- In order to get the first pages of the issues we needed, I had to manually open each issue and download the first page which was a tedious process, but it worked.

Have they forced you to change your initial plan?
- The main change to the initial plan was in the methodology of collecting first pages; we decided to use three pages per year from 1883-2019 since 1883 was the first full year of publications.
- Additionally, we have found a few tools that may be of assistance to us when it comes to performing our analysis but some are paid products.

Do you have a proposed solution or do you need help formulating one?
- Some of the paid/subscription-based products may be accessible through utilizing a free trial period.

What applications/languages/frameworks have you selected and how are you going to implement them?
- We know we are going to use OpenRefine to clean our data, and Voyant for our text analysis, but the main task right now is finding a way to get the text off the PDFs and into a list of words with their corresponding dates.

Our project is progressing, albeit a bit slowly, but we are still on track to have our deliverables done on time.

Personal messages:

Becca: Most of what I’ve done so far has involved this side of the project — that is, documentation. We all collaborate on the points that we want to go into our blog posts, and then I’ve usually been writing them up into a final product to post. I also recently met with Kristin Partlo at the Libe helpdesk to clarify the citation for an entire collection within an archive — that was info that I wasn’t able to find in our Gould Guides or elsewhere, and she was able to give me some really helpful guidance about citing the archives (particularly this collection, which she clarified is part of the archive’s digital collections but not the digital archive).

Julia: Given my role as “datamaster”, I took the first steps to acquire the data we need for the project. I downloaded each front page we needed and uploaded them to our team’s google drive. Next step is to figure out extracting the text from these pages so I can begin cleaning the data- they have been OCR’d so the text is highlightable in the pdfs, which makes the process easier. We have a few ideas for how we might extract the data.

Angie: I wrote an email to Sarah Calhoun in order to get assistance with the textual analysis side of the project. I have also investigated the Carletonian archives in an attempt to create a possible list of search terms we could examine the frequency of. Additionally, as webmaster, I’ve been reviewing previous Hacking the Humanities final projects and exploring different ways we could transform our project into an aesthetically pleasing website.

Maddy: Given my role as data analyst, I haven’t had much to do yet because we don’t have our data in an analyzable state yet. But, I have been getting caught up to speed on the goals and details of our project because I was gone when our team charter and project proposal were made. Better understanding the end goal of our project has helped me think about which kinds of data visualization tools might be best fit to tell a meaningful story through our data, so pondering that has been useful and productive.

Tags: archives, carletonian corpus, Text Analysis

5 thoughts on “Carletonian Corpus Final Project Update”

Having to manually download each front page one by one is really tedious, I hope you all can move past that stage soon. I also have a question: Does the Carletonian’s front pages have a summary for the whole paper? Since if you are just taking the front pages then some more common words used in latter pages might be ignored.

It sounds like you guys are on a good track for your project. I like how you have most of the tools already picked out and what purpose they will serve, it’s definitely important to know how and where you want to do each part of the project. I also like how you guys each have a specific role and have been working within them. Overall keep up the good work!

I like how you have organized the answers to the questions posed in a very organized manner, which makes it easy to follow along with you in your process. Having a good plan for how you wish to execute this project, and being flexible with what you can accomplish, as you have shown, is awesome.

Hey Julia, I appreciate you and your team’s flexibility to change your initial plan into something more realistic. That is definitely something that has been difficult for me in the past so I can respect the effort it took to do that. It sounds like you’re not the only group whose project is off to a somewhat bumpy start so don’t worry about it! Good luck on the rest of your project!

I can empathize with the sentiment of how much data one had to collect and the tedious process that went into getting it, let alone making it palatable. Good job being realistic and pivoting to something that’s actually reasonably possible, my group has also had to recenter our project and jump through some hoops to get something that looks somewhat decent in time for our presentation, let alone for the final product

5 thoughts on “Carletonian Corpus Final Project Update”

Leave a Reply to Will Shrestha Cancel reply

Final Project Update: Graduate School Mapping

Thorpe Pool 3D Tour Update

Analysis of Linked Jazz

Swiss Cheese 3D Model

Carleton’s major Majors Final Project Update

FINAL PROJECT: Carletonian Corpus

Week 9 Blog – Tutorial

Week 4 – Network analysis DH project reflection

Alumni Visualization: A Display of Carleton’s English Majors

Carleton’s Major Majors Website

Data Visualization Update: 3D Thorpe Pool

Data Visualization: Graduate School Mapping

Final Data Visualization

Popularity of Carleton Majors by Year – An Update

Exploring ArcGIS Mapping

Hacking the Humanities 2023W