Learn about R programming skills that can help you clean, manipulate, and analyze your data effectively. Explore which technical skills you might already possess and how to build new ones.
![[Featured Image] A programmer sits at a computer and uses R programming skills while a colleague stands at a computer desk in the background.](https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://images.ctfassets.net/wp1lcwdav1p1/6AyxBNxpc6Ii2aPpSeOgYs/f8dfa0bd03c64cef1664103376402c13/GettyImages-2015316371.jpg?w=1500&h=680&q=60&fit=fill&f=faces&fm=jpg&fl=progressive&auto=format%2Ccompress&dpr=1&w=1000)
R is one of the most popular statistical programming languages worldwide thanks to its intuitive development environment and extensive library of built-in packages. To take advantage of all R has to offer, developing a few key areas of expertise—including both high-level and technical skills—can help you stand out in your field and derive the most useful insights from your data.
When you program in R, you can choose many routes for data cleaning and analysis depending on your data types and technical expertise. However, having a few core competencies can help you understand the bigger picture of your workflow and how to effectively work with your information. In general, you’ll need to understand the type of data you have, how to clean it and prepare it for analysis, and how to choose the appropriate statistical model.
To effectively program in R, it helps to understand different data structures so you can choose the right functions and formats. This also enables you to tailor your data to different formats depending on your analytical end goal. Key data structures to know include:
Vectors: Vectors are ordered collections of the same element type, such as numbers or characters.
Matrices and arrays: Matrices and arrays represent multidimensional data. You’ll often use these for mathematical computations.
Data frames: Data frames represent data in rows and columns, similar to how spreadsheet tables hold data. While data frames can contain multiple types of data, each column must have the same information type.
Lists: Lists can be used instead of vectors if you need to hold multiple types of data simultaneously.
Factors: If you’re working with categorical variables, you can use factors to represent characters or words.
When you have large amounts of data, being able to compile, sort, and manage your information is important to understand what it's telling you and make decisions based on accurate insights. A few functions and packages to help with data manipulation and cleaning in R include:
Dplyr is a package that helps you reformat your data easily. Typically, you’ll use functions included in the dyplr package to split your data, apply a function calculating some metric of choice, and combine these metrics into a concise, easy-to-read table. Functions you might call on include ones such as:
mutate(): Create new columns by modifying existing columns.
group_by(): Group data by certain characteristics to perform joint operations.
select(): Pick certain variables or columns to work with.
left_join(), right_join(), full_join(), inner_join(): Merge data by matching in different ways.
filter(): See a subset of data that matches certain conditions.
Tidyr is a package that helps you simplify the process of cleaning your data with built-in functions. Once you organize your data, you can create simple code to combine analytical steps. Some functions to explore include:
gather(): Switch between wide and tall formats to make wide data longer.
spread(): Switch between tall and wide formats to widen tall data.
separate(): Divide a single column into several columns.
units(): Combine several columns into a single column.
Because of the built-in functionalities of R, many researchers and analysts choose this language for statistical modeling. To take advantage of this, you’ll need a basic understanding of data and statistical skills such as descriptive statistics, inferential statistics, and (depending on your data) time-series analysis. You can use R to perform common statistical tests such as t-tests, chi-square tests, ANOVAs, regressions, and more.
Once you understand the bigger picture, more refined technical skills can help you effectively complete your data-driven tasks. Skills that often come in handy when working with R include:
Depending on your field, mastering key packages can help you streamline data management and analysis processes and allow you to perform at a higher level. You can explore thousands of packages and functions in R to find what works for you. Some you might use for more general purposes, such as visualizations and cleaning, while others are more domain-specific. Consider the following domain-specific packages and when you might use them.
Forecast: This is useful for analyzing and predicting time-series data. For example, you might forecast your monthly sales for the upcoming quarter.
Shiny: You might use Shiny to build an interactive web application. For example, you could use this package to build a dashboard that allows users to filter graphs or visuals by different variables.
Caret: You might use this package if you work with model training for regression and classification problems and want to assess performance metrics. For example, you could develop a prediction model that forecasts housing price trends based on different house features.
Phyloseq: This package can be useful for working with microbiome data. For example, you can compare the relative abundances of bacteria in different populations based on environmental exposures.
RStudio is an integrated development environment (IDE) that allows you to more easily monitor your code development and find errors. This environment shows you your variables, lets you look at your data sets, expands visualizations, automatically debugs certain errors, and highlights different parts of your code so you can follow the logic. Learning to use RStudio allows you to streamline your workflow and more effectively manage complex projects.
Once you have your results, communicating your findings is an essential step forward based on your insights. Data visualizations help you showcase information to non-technical audiences in a clear and succinct way. To master data visualization in R, you can explore packages like ggplot2, which has built-in functions for various charts and graphs.
Writing information in your file that allows other professionals to understand and reproduce your code is essential to ensure your work is valid. For example, if you and another scientist analyze the same data set and get vastly different results, you’ll want to be able to pinpoint why that is and what the correct finding is.
Your data may not be entirely accurate. If you have findings based on a small data set, another professional may want to reproduce your analysis using a larger one to see if the findings remain true. This is especially important in medical and scientific fields. Documenting your code also helps inform other team members of what you’re doing and why, which can save time and open discussions related to methods, workflow, and areas of improvement.
You can gain computer programming skills, including R programming, through study and practice. While it’s important to learn the language syntax and different methodologies you can use to explore your data, putting what you learn into practice makes it stick. Some ways you can learn more about R and how to use it include:
Take online courses: Online courses help you learn to code at your own pace. You can access a structured environment to help you build relevant foundational knowledge before practicing your skills.
Engage with online communities: R is used worldwide, meaning you can find many online communities such as the RStudio Community, Stack Overflow, and GitHub, where people post their code, ask questions, and create an environment where you can learn from one another.
Work with practice projects. By trying out your skills with real-world data, you can solidify your understanding of new concepts, identify areas of difficulty, and build your way to more complex problems.
Understanding your data, deciding on the right statistical tests, choosing the most applicable R packages, and presenting your results can help you benefit most from R. On Coursera, you can continue building skills related to computer programming fundamentals with the Dynamic Programming, Greedy Algorithms introductory course. For a more comprehensive overview, consider building on this course by completing the Master of Science in Computer Science from UC Boulder.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.