The ‘Package Manager Snapshot* Approach’ - Introduction
At cynkra, we advocate an approach to extension package management that we call the Package Manager snapshot approach*. This approach works particularly well with a centralized Posit Workbench installation when multiple people collaborate on a project. The idea can be further enhanced by a combination of {renv}, Posit Package Manager and {cynkrathis}
The following functions of the {cynkrathis} form an interface to this opinionated project workflow approach:
cynkrathis::get_snapshots()
cynkrathis::renv_switch_r_version()
cynkrathis::renv_downgrade()
These little helper functions play an important role in making this workflow fun to apply. We will explain their use throughout this article.
The key idea is to couple a specific R version with a Package Manager snapshot. Using a specific Package Manager snapshot in options(repos = )
means that the project has only access to a static package source in contrast to a dynamic one when using a classic CRAN mirror such as https://cloud.r-project.org/.
In this case static refers to a snapshot of all available CRAN packages on a specific day. Consider the following example: By default, CRAN mirrors provide new package updates as they are released on CRAN. Hence, if one called update.packages()
daily, this would eventually trigger updates of some packages. This might be desired for R package development but problematic for analysis projects and production use, as every package update comes with the potential to break the analysis. Instead, a “static package source” only refers to a CRAN snapshot on a specific day and does not provide any other CRAN package version than the versions available at this specific day in time.
Using a “static package source” for analysis projects has the following advantages:
- Stable project environment by making use of packages which are likely to play together well.
- Package versions which are known to work with the R version used.
- Possibility of controlled package updates between Package Manager snapshots.
The combination of a specific R version and a Package Manager snapshot is ensured by {renv}. In renv.lock
, the R version and snapshot ID are listed.
{
"R": {
"Version": "4.1.0",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://packagemanager.posit.co/cran/2021-05-18"
}
]
},
[...]
}
The Nature of Snapshot-Centered Workflows
When a project is initiated with {renv} and packages are installed, only packages from this specific snapshot are installed. A “snapshot” here means that the repo where packages are looked up contains only the CRAN sources of this specific day in time. No additional packages will be available/added to this snapshot in the future.
This might seem limiting at the first glimpse but we are convinced that this strict limitation is a feature. Often when collaborating with other people, multiple people install packages at different stages of a project lifetime.
Let’s compare the scenario of a install.packages("dplyr")
call eight weeks after project start. Without a fixed snapshot, this would look as follows:
- A user installs {dplyr} five weeks after the project has started.
- Meanwhile {dplyr} was updated and with it some dependencies of it (maybe even the minimum versions were bumped). Hence,
install.packages("dplyr")
will also update more recent versions of other packages than just {dplyr}. - These updates have the potential to break some analysis in the project, possibly without anyone realizing.
In contrast, when using a snapshot-centered approach, install.packages("dplyr")
eight weeks after project start
- Will not cause any updates of already installed packages because the {dplyr} version which is going to be installed will be the {dplyr} version at the point in time referenced by the Package Manager snapshot.
- Already installed packages will not be installed again (thanks to {renv} and most importantly, no updates will be installed.
Practical Implications and Usage
Next, let’s discuss some practical questions which will be coming up at some point during the analysis:
- How do I upgrade/downgrade a single/all packages?
- How do I upgrade the R version?
Using Packages That Are Not Available in Snapshot
Often enough there is the need to use packages which are not available in the Package Manager snapshot, for example packages living only on GitHub or newer versions of packages. In general we recommend to try to get along with the packages available via the snapshot. If this is not possible, there is always the possibility to install a specific package from GitHub and track its exact version via {renv}.
The possible downside of this is that this installation will most likely update multiple packages in the project library and hence possibly break the stability of the fixed project library. While this is unavoidable if packages were only available on GitHub, the case is slightly different if a newer version of a package available on CRAN is needed. In this case, one might want to consider bumping the entire snapshot to a state that satisfies the requirement needed. This leads to using an alternative snapshot which we discuss in a separate section.
Using Alternative Snapshots for Specific R Versions
By default {cynkrathis} couples an R version with the snapshot of the day on which the R version has been released (see get_snapshots()
).
However, there are certain happenings during the time window until the next R version release (which is usually ~ 2 - 4 months) which can lead to the assignment of additional snapshots to a specific R version:
- Important updates to certain R packages might happen which provide added value to a project
- The snapshot assigned on the release date of the R version might inherit some incompatibility between packages by chance. For example, it might be a package of your choice was updated the day before the snapshot and now does not play well with another package.
In such cases, cynkra adds another snapshot to an R version. The source for these assignment lives in a JSON file in the cynkrathis package.
Now you might wonder where and how you can see which snapshots belong to which R version without looking at this JSON file all the time? Good question! Let’s discuss this in the next section.
Retrieving Snapshot Information
cynkrathis::get_snapshots()
returns snapshot information. Let’s look at the output and break down what information can be extracted from this output.
- Column
id
refers to the internal Package Manager (packagemanager.posit.co) ID of the respective snapshot. - Each observation lists the snapshot date (
date
) and the respective release date (r_release_date
) of the associated R version. - Column
note
gives some information about a particular snapshot. - Column
type
denotes the default snapshot of a particular R version when it’s tagged as “recommended”. - Each R version can have multiple snapshots assigned.
Updating Snapshots
By default we recommend to always update snapshots when you update your R version so that the R version is always coupled with a snapshot listed in get_snapshots()
.
There are two different scenarios:
- Updating/downgrading an R version
- Upgrading/downgrading snapshots only and keeping the same R version
Upgrading/Downgrading R Versions
When you decided to update a given snapshot because of an R version upgrade, one can make use of renv_switch_r_version()
. This function knows which snapshot is the default for which R version. Hence, there is no need to look up which snapshot ID to use when switching between R versions.
Let’s say one wants to switch to R 4.0.5. In this case, you can do
renv_switch_r_version("4.0.5")
in renv.lock.
→ Replacing R Version and Package Manager snapshot : 4.0.5.
✓ New R Version: 2021-04-23. ✓ New RSPM snapshot
Upgrading Snapshots and Keeping the Same R Version
Here it is hard to guess which snapshot one desires to use as renv_switch_r_version()
will always use the snapshot ID tagged as “recommended” in get_snapshots()
. Most often this is already the snapshot ID in use for the particular R version. If a different snapshot should be used, manual adjustments are required.
First, look up the snapshot you want to use by calling get_snapshots()
. Then simply replace the snapshot ID in line 7 of renv.lock
.
One can use any snapshot ID/date available and the ones returned by get_snapshots()
are only recommendations of cynkra but might not work for you personally. If you have good arguments why another snapshot might be helpful for a particular R version, you are welcome to open PR and share your thoughts with us!
OK - Let’s assume that you have updated/downgraded your snapshot: the last step would be to update the actual installed packages. Now that there is another snapshot in place, you have new package sources available and should synchronize your project library to it.
That said, for updating call renv::update()
. {renv} will show you a list of changes which will be applied and which packages will be updated to which version. Here a short example:
- gh [1.2.0 -> 1.3.0]
- highr [0.8 -> 0.9]
- jquerylib [0.1.3 -> 0.1.4]
Last, call renv::snapshot()
to record the new versions in the lockfile.
Downgrading Snapshots and Keeping the Same R Version
Downgrading is a bit more complicated but do not worry - {cynkrathis} got you covered. There is no dedicated downgrade()
function because by default this function would not know which version to match against for the downgrade (remember, Package Manager snapshots are not used by default by {renv}). One option is to go with renv::revert()
and restore the lockfile contents of a previous commit. However, this approach has the downside that it does not account for possible new changes to the lockfile one has made meanwhile.
What you actually want is to restore all packages listed in the lockfile with their version available in the configured snapshot.
installed_pkgs = unname(installed.packages(lib.loc = .libPaths()[1])[, "Package"])
renv::install(installed_pkgs)
The snippet above will install all installed packages again using the version of the snapshot date listed in renv.lock
. Luckily, there is a function in {cynkrathis} to simplify this: renv_downgrade()
. It will restart the session to ensure it is picking up the correct repo option and then execute the calls shown above.
Next, call renv::snapshot()
and you’re good to go.
The output should look similar as the following one
- testthat [3.0.2 -> 3.0.1]
- textshaping [0.3.4 -> 0.2.1]
Default {renv} Options and Settings
We also recommend setting some default {renv} options and settings to make project work even more enjoyable. There are more potential settings which might need adjustments when administrating a Posit Workbench instance for multiple people.
Name | Value | Type |
---|---|---|
renv.config.auto.snapshot |
FALSE | option |
RENV_PATHS_PREFIX_AUTO |
TRUE | Env var |
-
renv.config.auto.snapshot
: There are different scenarios in whichenv.config.auto.snapshot = TRUE
can lead to undesired situations:If the R session was not restarted for some time and a collaborator updated
renv.lock
meanwhile on the git remote, one does not get a reminder that the package library is out of sync. If one new installs a new package, auto-snapshot kicks in causingrenv.lock
to become a potential merge conflict. If this file is now committed before changes are pulled (possibly due to some automation happening), the push is rejected and the conflicts needs to be solved first.Even if one gets a remember and restores the remote state first, the restore may fail. One might frantically try to install packages manually, which works, but then auto-snapshot writes a bogus state into the lock file.
If everything works out great, and one actually manages to restore cleanly and install new packages; now everybody else immediately sees all intermediate experiments with packages that one did to test-drive and then later removed again. “Ideally,” one commits straight to the main branch, so that everybody can track the flow of one’s exploration and happily installs all the packages which have been touched to their library, only to remove then again later.
These are just a few scenarios which actually happened in practice. With manual snapshots, we decouple snapshotting from restoring and installing, which gives us a bit more control. This works because in a multi-user project there is often one or few contributors who define the package environment, and the others just follow (and don’t need this setting, to begin with).
RENV_PATHS_PREFIX_AUTO
: When setting this environment variable toTRUE
, {renv} will create OS-aware project libraries. This prevents conflicts after OS upgrades, especially on Linux. For example, binary packages on Ubuntu are not compatible across LTS versions because they link against specific system libraries which differ between these OS versions. Without this setting {renv} would try to re-use already existing packages which then leads to errors during load time.