Week 9 Blog – Tutorial

I’m providing a tutorial on the ggplot2 package in R studio. This package is easy to use and very helpful. It asks users to import data and provide information on how to map variables and aesthetics, and then it would take care of the details and automatically create various kinds of graphics based on the commands. Plots such as histograms, bar graphs, and box plots are the most common types of data visualization in statistical analysis. Thus, this tool should be very useful for many DH projects that involve statistical analysis.

Step 1. Import the package

Importing the package into your R studio workbench is very simple. You only need this one command shown below.

Step 2. Import your data

R allows you to import your data in a few ways. If the data is already loaded into R as a package, you can use the same library command to access the data. If not, you can use the “read_csv” function with the file name to load your data. The two commands are given below.

Step 3. filter the data

R can be extremely useful for data cleaning and pulling out the data you want to visualize but you don’t always need this step. The command “filter( )” finds rows or cases where your indicated conditions are true, and “pull” selects a column in a data frame and uses it as a vector. An example of using these functions is shown below. The original data set contains the salary of major league baseball players from 1985 to 2015. If I want to compare the average salary of 1985 and 2015, I can pull out these two columns using “filter” and “pull”.

If there are empty cells or NA entries in the CSV file, we can either use the filter function or the “drop_na” command to handle this.

Step 4. Choose the type of plot

Once you have your data ready to plot, you can choose what types of plots best illustrate your data depending on the variables and purpose of your data visualization. You should indicate your data frame, variables, and aesthetics in the command in order to make it work. I have two common ones shown below.

Step 5. Customize the plot

Lastly, as you have already seen in some examples above, you can customize your plots with different commands. The most basic ones are adding titles, titles, and best-fit lines. R allows you to add labels and titles with the command “labs()”, and the “geom_function” command allows you to add a best-fit curve or line. The exact commands are shown in the examples below. You can even specify what colors you want on your graphs.

Further Sources

There are many tutorials around the web about the R studio. I attached two detailed ones on the ggplot2 package below. You can check them out if you want to know more about this package.

https://bookdown.org/kochiuyu/Technical-Analysis-with-R/ggplot2-package.html

https://r-graph-gallery.com/ggplot2-package.html

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php