Local (Internal) Packages - Introduction

Thanks to modern boilerplating, the creation of R (extension) packages is a great outline to create well documented, easily deployable data science and analytics software. Hence it became a popular practice to create local, organization-specific R packages. In this section, we cover different options to host, manage and distribute such non-public, internal packages.

CRAN-like Structure

To be able to install custom packages via install.packages(), one needs to provide the packages in a so-called “CRAN-like” repository. This means, providing a web resource which stores essential package components in a defined directory structure:

bin/
PACKAGE
src/

where

  • bin/ is the directory containing the binary builds of the package (e.g. for Windows and macOS)
  • PACKAGE is a plain-text file serving as an index which lists all available packages in this repository
  • src/ contains the source code of each package which is used if the package is “installed from source”

Package Updates

When thinking about your package distribution approach, consider that alpha or beta stage packages are often widely used within an organization as they solve an important bottleneck within the toolstack of an organization.

Hence it is important to foster for a smooth update process to secure fast adaption of new features and versions which are easily accessible by the whole team. In other words, if you made changes to your internal package and install it again into your local library, you would need to tell all your collaborates to do the same - otherwise they might run into conflicts as they would operate on an old code base. Hence, thinking about the deployment approach right from the start is an important task during the initial development process.

Posit Package Manager

Posit’s Package Manager is probably the most convenient solution to serve local packages. It automatically takes care of the steps required to create a CRAN-like repository and serving it. In a nutshell, the Package Manager

  • builds an R package from a source (e.g. GitHub) automatically (e.g., when a new version has been tagged) and puts it into a CRAN-like internal structure
  • provides a URL to install the packages which can be injected into options(repos = )
  • provides a browsable interface to see which packages - and their respective versions - are available

cynkra’s own package repository is an example of a package manager instance operated by cynkra to host our own R packages.

Self-Hosted CRAN-Like Repositories (drat, etc)

This method is the first fallback if no Package Manager is available/desired and your team has some basic/advanced understanding of CI/CD, webservers and package distribution.

There are two R packages helping with the creation of “CRAN-like” repositories: {drat} and {minicran}.

While you can use {drat} or {minicran} yourself, we have written a convience helper function cynkrathis::deploy_minicran_package(). This function does most of the manual steps which are required to deploy a local package to a CRAN-like repository automatically. The only input required is the URL to the respective git repository (see the function reference for more details).

The remote source could also be an S3 bucket! Note that deploy_minicran_package() does not support S3 buckets yet and only works with git repositories.

In the end, the packages get served via the ‘Pages’ functionality of the respective git provider (e.g., GitHub Pages is providing the webserver in the background to serve a https:// connection). This URL can be used as an input for install.packages(repos = ) or options(repos = ).

Installing from Local Package Sources

We do not recommend this approach and just list it here to showcase a complete list of available options!

During the development phase of an internal R package, you usually have access to the source code (either because you are the developer or the source code is hosted in a git repository within your organization). Hence you can run devtools::install() or remotes::install_local() to install the desired package from source into your personal R library.

Even though static installs from source get you started quickly, there are other aspects to consider. Maintenance and inclusion of other early stage users is more difficult than necessary. Without a central source to install the package from, collaboration becomes complicated. There is a high probability that multiple people will run a different version of the project and might not even be aware of the source the latest version. While tools like {renv} are able to deal with local static tarballs, their practical handling is cumbersome as one either needs to commit the complete tarball into version control or provide a file-system link to the source, which will only work if all collaborators work on a shared system.

To workaround this, one might need to setup a Continuous Integration pipelines which automatically builds the package and deploys it to a shared location. Setting up such a project is not trivial and also requires a functional CI/CD pipelining tool.