Stringr: An Essential R Package for String Manipulation

Introduction

The stringr package is a popular package in R for working with strings. It provides a consistent and easy-to-use set of functions for manipulating and analyzing text data. Here are some of the key features of the stringr package:

  1. Consistent syntax: All functions in stringr follow a consistent syntax that makes it easy to remember how to use them. Most functions start with str_ followed by the action you want to perform, such as str_extract for extracting a pattern from a string, or str_replace for replacing a pattern in a string.
  2. Text manipulation functions: The package includes a wide range of functions for manipulating text data, such as str_trim for removing whitespace from the beginning and end of a string, str_pad for adding padding to a string, and str_sub for extracting a substring from a string.
  3. Regular expression support: stringr provides full support for regular expressions, which are a powerful way to search for and manipulate text patterns. Many of the functions in stringr can take regular expressions as arguments, and the package also includes its own set of regular expression functions for more advanced text manipulation tasks.
  4. Integration with other R packages: stringr integrates well with other popular R packages, such as dplyr and tidyr, making it easy to work with text data in a larger data analysis pipeline.

Overall, the stringr package is an essential tool for anyone working with text data in R. Its consistent syntax, wide range of text manipulation functions, and support for regular expressions make it a powerful and flexible package for analyzing and manipulating text data. This is exactly why I believe it is extremely useful for the digital arts and humanities. With so many DGAH projects requiring data cleaning to prepare visualizations of text analysis, alumni locations, and much more, stringr is able to perform the needed manipulation of often messy strings. I believe stringr is more valuable than tools like OpenRefine because knowing the code (or following the package cheat sheet) allows for users to easily command what data manipulation to be done, rather than having to search for string manipulation methods using OpenRefine’s mainly point-and-click system.

Tutorial:

1. Install and open R-Studio to your desktop. Another way, if you are a Carleton student, is to use Carleton’s Maize Server version of R.

2. Create a new R Markdown to begin your R work.

screenshot of r code

3. Import or create your data set into R. Either import a csv file or create your data set by using data.frame and manually entering your table. Then, run your code chunks to create the data set into R.

screenshot of r code

4. Now we have a data set that can be used for cleaning. Here is an example of data cleaning using stringr. We want to clean the Fruit names so they are more uniform and readable. Specifically, we want to remove the unnecessary characters like numbers, hyphens, etc.

screenshot of r code

5. In order to accomplish this goal, we do some base cleaning commands using stringr. Specifically we will use str_sub to compile a clean list of all of the fruits in our dataset. Since the fruits are 5 letters long, with the unnecessary characters coming after, we can just extract the first 5 characters. Now, we are left with a simple, uniform, clean list of the fruit names.

screenshot of r code

You can use stringr to accomplish so many more things. Below is a useful cheat sheet that provides some guidance of stringr’s potential and its useful commands. Also, for additional guidance on using stringr for data cleaning, here are 2 videos that are useful (1 and 2).

Screenshot of stringr cheatsheet

2 thoughts on “Stringr: An Essential R Package for String Manipulation

  1. Really nice tutorial, Chris! The steps were easy to follow, and the screenshots help a lot as well! They were well selected and placed. I feel like you were able to explain a fairly complex topic very well! It seems like a very good way to clean data in R, and I appreciated the added cheat sheet as well!

  2. This is a super helpful tutorial! I really like that you began your tutorial with a summary of the benefits of using Stringr. Your screenshots are really effective, and I especially appreciate the circling and arrows that you drew on them. It makes it much faster and easier to understand what steps you are referencing! Nice addition with the Stringr cheat sheet, too. Now I’m intrigued and want to try out Stringr.

Leave a Reply to Scott Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php