Why org?
Managing research projects and data analyses can be challenging when dealing with:
- Inconsistent project structures across different analyses
- Mixed requirements for code (version control), results (sharing), and data (security)
- Collaboration difficulties when team members use different folder structures
- Version tracking for research submissions and revisions
- Cross-platform compatibility issues with file paths
The org
package solves these problems by providing a standardized framework for organizing R projects with clear separation of concerns and consistent structure across all your analyses.
Installation
# Install from CRAN
install.packages("org")
# Or install development version from GitHub
# devtools::install_github("csids/org")
Quick Start
Here’s how to get started with your first org
project:
library(org)
# 1. Initialize your project structure
org::initialize_project(
env = .GlobalEnv,
home = "my_analysis",
results = "my_results"
)
# 2. Access project paths
org::project$home # Your code location
org::project$results_today # Today's results folder
# 3. Use org functions in your analysis
org::path("data", "file.csv") # Cross-platform paths
org::ls_files("R") # List R files
Concept
The concept behind org
is straightforward - most analyses have three main sections:
- Code: Analysis scripts and functions
- Results: Output files and figures
- Data: Input data files
Each section has unique requirements:
Code Requirements
- Must be version controlled
- Should be publicly accessible
- Needs a single analysis pipeline documenting all steps
- Should be organized into modular functions
Project Structure
Core Components
1. org::initialize_project
This is the main function that sets up your project structure. It takes 2+ arguments and saves folder locations in org::project
for use throughout your analysis:
-
home
: Location ofRun.R
and theR/
folder (accessible viaorg::project$home
) -
results
: Results folder that creates date-based subfolders (accessible viaorg::project$results_today
) -
...
: Additional folders as needed (e.g.,data_raw
,data_clean
)
2. Run.R
This is your main analysis script that orchestrates the entire workflow:
- Data cleaning
- Analysis
- Result generation
All code sections should be encapsulated in functions in the R/
folder. You should not have multiple main files, as this creates confusion when returning to your code later. However, you can have versioned files (e.g., Run_v01.R
, Run_v02.R
) where later versions supersede earlier ones.
Example Project Structure
Here’s a complete example of how to structure your project:
# Initialize the project
org::initialize_project(
env = .GlobalEnv,
home = "/git/analyses/2019/analysis3/",
results = "/dropbox/analyses_results/2019/analysis3/",
data_raw = "/data/analyses/2019/analysis3/"
)
# Document changes in archived results
txt <- glue::glue("
2019-01-01:
Included:
- Table 1
- Table 2
2019-02-02:
Changed Table 1 from mean -> median
", .trim=FALSE)
org::write_text(
txt = txt,
file = fs::path(org::project$results, "info.txt")
)
# Load required packages
library(data.table)
library(ggplot2)
# Run analysis
d <- clean_data() # Accesses data from org::project$data_raw
table_1(d) # Saves to org::project$results_today
figure_1(d) # Saves to org::project$results_today
figure_2(d) # Saves to org::project$results_today
Research Article Versioning
When writing research articles, you often need multiple versions (initial submission, resubmissions). org
helps manage this by using date-based versioning:
- Initial submission:
- Rename
Run.R
toRun_YYYY_MM_DD_submission_1.R
- Rename
R/
toR_YYYY_MM_DD_submission_1/
- Rename
- Resubmission:
- Create new files with updated dates
- Keep old versions for reference
This preserves the code that produced results for each submission, ensuring all changes are deliberate and intentional.
Team Collaboration
When working with team members who have different folder structures, you can specify multiple possible paths. The org
package will automatically select the first path that exists:
# Team member setup - org will use the first existing path
org::initialize_project(
env = .GlobalEnv,
home = c(
"/Users/teammate1/projects/analysis3/", # Mac user
"/home/teammate2/analysis3/", # Linux user
"C:/Users/teammate3/analysis3/" # Windows user
),
results = c(
"/Users/teammate1/Dropbox/results/",
"/home/teammate2/dropbox/results/",
"C:/Users/teammate3/Dropbox/results/"
),
data_raw = c(
"/Users/teammate1/data/analysis3/",
"/home/teammate2/data/analysis3/",
"C:/shared_drive/data/analysis3/"
)
)
This approach allows the same initialization code to work across different team members’ machines without modification.
Best Practices
Recommended Structure
Store your project components in appropriate locations:
# Code (GitHub)
git/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$home
│ │ ├── Run.R
│ │ └── R/
│ │ ├── clean_data.R
│ │ ├── descriptives.R
│ │ ├── analysis.R
│ │ └── figure_1.R
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Results (Dropbox)
dropbox/
└── analyses_results/
├── 2018/
│ ├── analysis_1/ # org::project$results
│ │ ├── 2018-03-12/ # org::project$results_today
│ │ │ ├── table_1.xlsx
│ │ │ └── figure_1.png
│ │ ├── 2018-03-15/
│ │ └── 2018-03-18/
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Data (Local)
data/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$data_raw
│ │ └── data.xlsx
│ └── analysis_2/
└── 2019/
└── analysis_3/
Alternative Structures
RMarkdown Project
For projects on a shared network drive without GitHub/Dropbox:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── CleanData.R
│ ├── Descriptives.R
│ ├── Analysis1.R
│ └── Graphs1.R
├── paper/
│ └── paper.Rmd
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table1.xlsx
│ └── figure1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
Single Folder Project
For projects with limited access:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── clean_data.R
│ ├── descriptives.R
│ ├── analysis.R
│ └── figure_1.R
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table_1.xlsx
│ └── figure_1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
Path Naming Conventions
Understanding path components is important:
Component | Name |
---|---|
/home/richard/test.src | Absolute (file)path |
richard/test.src | Relative (file)path |
/home/richard/ | Absolute (directory) path |
./richard/ | Relative (directory) path |
richard | Directory |
test.src | Filename |
A path specifies a location in a directory structure, while a filename only includes the file name itself. Directories only include directory name information.
Function Reference
The org
package provides several key functions for project management:
Core Functions
-
org::initialize_project()
: Set up project structure and source R files -
org::set_results()
: Modify results folder after project initialization
-
org::project
: Environment containing all project folder locations
File Operations
-
org::path()
: Construct cross-platform file paths -
org::ls_files()
: List files with optional pattern matching -
org::move_directory()
: Move directories safely -
org::write_text()
: Write text files with consistent formatting
Utility Functions
-
org::package_installed()
: Check if packages are installed -
org::create_project_quarto_internal_results()
: Create Quarto projects with internal results -
org::create_project_quarto_external_results()
: Create Quarto projects with external results
Common Workflows
Setting Up a New Analysis
# 1. Initialize project structure
org::initialize_project(
env = .GlobalEnv,
home = "/path/to/your/analysis/",
results = "/path/to/results/",
data_raw = "/path/to/data/"
)
# 2. Create analysis functions in R/ folder
# 3. Run analysis from Run.R
# 4. Results automatically saved to org::project$results_today
Working with Existing Projects
# Reinitialize existing project
org::initialize_project(
env = .GlobalEnv,
home = "/existing/analysis/path/",
results = "/existing/results/path/"
)
# Update results location if needed
org::set_results("/new/results/path/")
Environment Management
Recommendation: Always use .GlobalEnv
- it makes life so much easier! All your functions will be directly accessible without having to worry about environment scoping issues.
# Recommended approach - use .GlobalEnv
org::initialize_project(env = .GlobalEnv, ...)
# Only use custom environments in special cases (e.g., package development)
my_env <- new.env()
org::initialize_project(env = my_env, ...)
Path Construction and Cross-Platform Compatibility
The org::path()
function ensures your code works across different operating systems:
# Cross-platform path construction
data_file <- org::path(org::project$data_raw, "survey_data.csv")
output_file <- org::path(org::project$results_today, "analysis_results.xlsx")
# Handles multiple path components
nested_path <- org::path("folder1", "subfolder", "file.txt")
# Removes double slashes automatically
clean_path <- org::path("folder//", "//file.txt") # Returns "folder/file.txt"
Troubleshooting
Common Issues
Path Issues
- Always use
org::path()
for cross-platform compatibility - Avoid hardcoded absolute paths in shared code
- Check that all specified directories exist and are accessible
- Ensure you have write permissions to results directories
Getting Help
- Check the package documentation:
help(package = "org")
- View function help:
?org::initialize_project
- Report issues at: https://github.com/csids/org/issues