Choose your licence for your package allows you to declare how you want your code to be used and an important step if you use or include code written by someone else, to respect his work and license associated. You don’t need, thankfully, to be an expert and you just need to follow several guidelines. If you have any doubt or if you are in a specific case, like if selling your package, don’t hesitate to consult juridic service of your company/institute or for example a lawyer.
Note
In this section we will just give you basic rules to ask you the good question and if necessary go deeper in the subject. Regarding that, you will find materials like this website for open-source license comparisons, the chapter 12 of the R Packages book [1] which describes a good overview of the license topic, another book Licensing R write by Colin Fay [2] or at least the section 1.1.2 of the R manual “Writing R Extensions”.
8.1 Global overview
There are 3 main kind of licenses when you talk about software development: the open source, proprietary and source-available licenses. If we go beyond software development, you can find other families of licences in relation to the kind of creation. We can mention Creative Commons (CC) family for creative works (art, photography, music, …), Open Data Commons (ODC) licences for data and databases licences or the GNU Free Documentation License for Documentation and Educational Content Licenses for manuals, textbooks, tutorials, and academic materials. Here we are just focusing our reflection on software development family (with a little shortcut for licences dedicated to data) because we only deal in our example with an R package.
Important
Keep in our mind that defines a license is an important step to do, and skip it is a possibility but if you are doing that, the default copyright laws apply: no one is allowed to make a copy of your code without your express permission.
8.2 Software development licenses
Focusing on this kind of creation, three major families were available.
The first one is open source licences. These licenses allow users to access, modify and redistribute the source code freely. They promote transparency, collaboration, and innovation.
The second one is proprietary licences. These kind restrict access to the source code and limit how the software can be used, modified or distributed. The aims under that could be protection of commercial interests and maintain control over the software.
The last one is source-available licenses. These one make the source code visible but do not grant full open source freedoms like we could have with open source licences. This allow transparency while preserving business models or ethical boundaries.
For our use here, we mainly talk about open source licences because we will use the guideline of the FAIR principles to rule our work. We will see below how to specify a proprietary licence if necessary but I let you go deeper into these licences, and the source-available licenses, if you think your work needs one of them.
8.3 Open source licences
In the family of open sources licences for software development we find two major kind of licences :
The permissive licenses are the most simply to use. With this kind of license you can freely copied code, modified or published it and the only restriction is that the license must be preserved. For example, mainly of resources associated with the tidyverse field is under MIT license. If you go deeper into the question of license on tidyverse, you will find that a discussion occurred in 2021 regarding the selection of the best license for it. This brings to the table that no license if perfect and under the process of selection you have to be caution what keeps the compatibility of licenses already applied (we discuss that later in the section to define). As an example, you should also find license like Apache which is another mainly used currently permissive license.
Copyleft licenses are stricter. One of the most popular is the General Public License, in the last version 3.0, which allows you to freely copy and modify the code for personal use, but if you publish modified versions or bundle with other code, the modified version or complete bundle must also be licensed with the GPL. With this kind of licences you ensures that software remains free and open in all its future versions.
You will find below a quick summarise of theses two kinds of licenses (table 1).
Table 1: Comparison copyleft versus permissive licenses.
Aspect
Copyleft License
Permissive License
Freedom to modify
Yes, but derivatives must keep the same license
Yes with no obligation to keep the same license
Redistribution rules
Must include source code and remain open
Can be redistributed as closed-source (associated to proprietary licences for example)
Protection of openness
Strong, prevents privatization of derivatives
Weak, allows integration into proprietary software
Commercial flexibility
Less flexible for companies
Highly flexible, widely adopted in industry
Legal complexity
More restrictive, requires careful compliance
Simple and lightweight
Philosophy
Preserve freedom for all users
Maximize adoption, even in closed environments
8.4 Location of a R package license
When you develop a R package and you want to add a license, the information associated is located at 3 or 4 location in your source package state.
The first location is in the DESCRIPTION file in the License field (Section 7.4). This name has to be in a standard form for example pass the R CMD check (we will see that in the section to define). Briefly, you can have these forms :
a name and a version specification, for example GLP (>= 2) or Apache License (== 2.0). Related to the version, be aware of the licenses compatibility between them (check Section 8.5.1),
a standard abbreviation, for example GPL-2 or MIT,
a name of the license “template” and a file associated (located at the root of the source package state, see the next location point). For example, in the License field you could find MIT + file LICENSE. This kind of form is only necessary for several licenses but not for all. If you apply a GPL license to your package, the mention + file LICENSE is not necessary because CRAN can automaticelly identifiy the license. For simplify your life, use the usethis functions associated to each license to add them in your package (the correct form and file(s) associated will be added automatically),
Pointer to the full text of a non-standard license (for example in the case of a proprietary licence), with file LICENSE in the License field (associated like before with a file LICENSE).
Hopefully, in the majority of case you will use the form License_name_version + file LICENSE in License field of the DESCRIPTION file but be aware that you can also have some complicated specification, with an associated with several licenses for you package (if you bundle code from another R package for example). Take a look in the section 1.1.2 of the R manual “Writing R Extensions” for concrete examples.
In relation to the first location, the second is related to the LICENSE file specifies before. This file could be a template required additional details to be complete (at the end you will have inside the year and copyright holder), or can also contain the full text of non-standard and non-open source licenses.
Important
If your license is a standard license, you are not permitted to include the license full text associated in the LICENSE file.
The third location of the license information is in a LICENSE.md file. This file includes a copy of the full text of the license (in opposition to the LICENSE file of the second location where you are not allowed to include the license full text, except in the case of a non-standard or non-open source licenses). In the majority of open sources licenses, you have the obligation to include a copy of the full text license when you distribute your software. For example, in the MIT you will find the following sentence:
“The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.”.
The only problem is if you want to publish your package on the CRAN, this one doesn’t permit you to include a copy of standard licenses in your package. To correct that, we will use the .Rbuildignore file (see Section 8.7 below) to make sure this file is not sent to CRAN.
The last location of license information is optional, but could occur in a LICENSE.note file. We will develop this case in the Section 8.5.3 below, but just to summarize, this could happen when you bundle code in your package from other packages and you need to keep the licenses associated with the source packages.
8.5 Application cases
Due to the difficulty to have a full overview of all licenses (it really links to your needs), tha aims of the following sections are to develop several cases related to what could occur in the development life of your package.
8.5.1 For the code you write
In a natural way, we will start with the code that you write. It’s not an exhaustive list but you should have material to make your first move. You will find at the end of the section a comparative table with usethis functions to create license resources in your package repository.
One of the most popular licenses in the world of open source license is the MIT license. This is a permissive license created in the late 1980s by the Massachusetts Institute of Technology. Globally you are allowed to use it for personal or commercial projects, modify the code however you like, distribute original or modified versions and even sell the software associated or include it in the proprietary products. Furthermore, the license is short and easy to understand, compatible with other licenses and encourages collaboration and reuse. As example, the majority of tidyverse R package were under MIT license.
Another popular open source license is the GPL (for General Public License). Created by Richard Stallman in 1989 as part of the GNU Project, unlike MIT license, GPL is a copyleft license. That mean that software, and any derivation work, remains free and open. If you distribute a modified version of the software, you must release your changes under the GPL license. You’re allowed to use the software freely, modify the source code and share the original or modified version, but you can’t make it proprietary. In addition, there are two major versions, GPL version 2 and GPL version 3. In summary, the GPL version 3 is an update of the version 2, more in line with the modern way. The last one is more complete, have better protection for users and developers against patent lawsuits and hardware restrictions (like tivoization or DRM/Digital Rights Management).
Important
Be careful because the two licenses are not compatible (you can’t bundle GPLv2 and GPLv3 code in the same project). To solve that, it’s generally recommended to license your package as GPL >=2 or GPL >= 3 so that future versions of the GPL license also apply to your code.
Apache 2.0 is a permissive open source license created by the Apache Software Foundation. Similar to the MIT licence, this one is longer and more detailed and includes an explicit patent grant. Be careful because he is only compatible with GPv3, not deal with the question hardware restrictions. Basically, it gives you freedom with responsibility, ideal for businesses and developers who want flexibility without the copyleft obligations of GPL. It’s a license very used in the world of Android OS.
LGPL v3 license, for Lesser General Public License, is an alternative between strict copyleft licences, like GPL, and permissive licences like MIT or Apache. Created by the Free Software Foundation, this one allow developers to use open source libraries inside proprietary software, without forcing the entire application to be open source. It’s just like a weak copyleft designed mainly for software libraries. Concretely, you can use this license in proprietary or open source applications, you are not obliged to open your code but you have to link it to the library dynamically or provide resources with the modified version. At least if you modify and distribute the library, you have to release it under the LGPL license.
The AGPL v3 (GNU Affero General Public License) is a license with a stronger copyleft. It’s based on the GPl version 3 license but cover one breach of this one, using your software over a network, like SaaS or Software as a Service platforms (deployment of an R Shiny application on a web server is a SaaS). With a AGPL license, demand source code sharing even for remote use. In addition, you have to keep the software under the AGLP license if you distribute or deploy it.
If your package share only data the previous licenses not fit well because they are designed specially to apply on source code. In this case, you can take a look on two Creative Commons licences.
The first one is CC0 license for Creative Commons Zero. It’s a permissive license equivalent to the MIT license, but applies to data. You don’t have any restrictions, you may copy, modify, distribute, and use the work without attribution, even for commercial purposes. In addition, you don’t need to cite the author (unless required by local law, as in France). Globally, the CC0 retains no rights. It is a radical alternative that aims to facilitate sharing and reuse without constraints.
The second one is the CC-BY license (the last one is the Creative Commons Attribution 4.0 International). With this one, you can copy and redistribute, but also remix, transform, and build upon the material for any purpose, even commercially. The only requirement is to give appropriate credit to the original creator, with naming the author, providing link to the license and indicating if changes were made. In clear, it’s if you want to require attention when someone else uses your data.
To conclude this section, you will find below two comparatives tables to support your license choose (table 2 and table 3). In addition, you can visit the following website which give you additional support.
Table 2: Comparison between “classic” open source software development licences.
If you need to define a proprietary license (not open source), you can use the function use_proprietary_license() available in the usethis package.
8.5.2 For the code given to you
In many case in your package life, you will include in code not written by you. The first possibility if where people inject code into your package, for example through a git pull request.
In most of cases, when someone contributes code to your package he license that content under the same terms that your package and we assume have the right to licence that content under those terms. If you use GitHub it’s very clear in the GitHub terms of service page.
However, it’s important to keep in mind that the author of the code retains copyright of their code, which means that you can’t change the license without their permission (we discuss that in Section 8.6). If you want to keep the possibility to change the license without any needs from the author, you need an CLA (Contributor License Agreement) where the author explicitly reassigns the copyright.
In best practices it’s also important to acknowledge the contributors (it’s very nice to received this kind of feedback). For that, you have several options, depending of what you prefer :
During section regarding the documentation of your package (put section here) we will introduce a file called NEWS.md. In this file, you release changes and updates of your package, and it’s a good place to indicate and acknowledge the contribution to your package,
Another option is to use the Author field in the DESCRIPTION file with the role cph. It could be hard at the end if you integrate all the contributors to the DESCRIPTION file, in terms of display of this one, but it’s a possibility.
8.5.3 For the code you bundle
The second form of integration of external code in your package could be if you bundle code. This could happen for example if you copied a piece of R code directly from another package to avoid taking dependency of it (?sec-dependencies). This kind of action has to be done with caution, because indeed you avoid dependency with another package (and in addition you reduce the complexity of your package and the management associated), but the code associated will is not updated automatically (for example in case of a bug fixes) and you have to deal with the licences compatibility (Section 8.5.3.1). In fact, it’s really useful when you need only a very small part of the code from a big package, or if the package is no longer updated.
8.5.3.1 License compatibility
We begin to discuss that in the table 2 and table 3, through the column “Other licences compatibility”. Before bundle someone else’s code into your package, you have to check licenses compatibility between your package license and the source package. Keep in your mind that you can add additional restrictions, but you can’t remove restrictions. Which means that license compatibility is not symmetric.
In addition to the previous information in the tables, you can consider the following case to support you:
If your license and their license are the same, it’s OK to bundle,
If their license is MIT or Apache, it’s OK to bundle,
If their code has a copyleft license and your code has a permissive license, you can’t bundle their code (the copyleft impose on you to keep the software open source),
If the code comes from Stack Overflow, it’s licensed with the Creative Common CC BY-SA license, which is only compatible with GPLv3.
If you are none of any one of these previous cases, you have to do little research. You will find on the internet several diagrams of compatibility between licenses and you have to deal with terms like permissive compatibility, copyleft compatibility, collective work compatibility or again derivative work compatibility.
If your package isn’t open source, things are more complicated and the best option could be to check with your legal department first.
8.5.3.2 How to include in your package
If all the licenses associated to your package are compatible, you can bring the code into your package. Now it’s important to follow the next steps:
if you’re including little piece of code, the best practices is to put in its own file and ensure that the file has copyright statement and license description at the top,
if you including multiple files, put them in a directory and put a LICENCE file in that directory,
you need to update the Author field in the DESCRIPTION file and used the role cph to declare them. In addition, you can use the comment field to describing what they’re the author of. You could find a example here for a package named diffviewer,
if you plan to submitting your package on CRAN and bundled code has different license, you have to include a LICENSE.note file that describes the overall license of the package and the specific licenses of each individual component. Take a look on the file from the package diffviewer for example.
8.5.4 For the code you use
The last application case is directly related to the use of R. The fact is R is licensed under GPL license version 2 superior or equal.
So the question is does my code write through R has to licensing under a GPL => 2 license?
To be clear, this question has been discussed before by the R foundation and you can have a similar approch in the R package book written by Hadley Wickham and Jennifer Bryan ([1]). To really understand the subject is important to be clear of what is a redistribution or not of the R’s code. This point is crucial because is you redistribution even a small portion of the R code, you have to follow the compatibility licence rule, associated to the one of R (in this case GPL). Basically, you have redistribution when you :
copy an R function into your own package (even partially) and share your package,
you modify an R function and redistribute it,
you integrate R source files into your own project and share it.
When you work on R and use the R functions associated (or associated to R package), you use R as an environment (an API or Application Programming Interface). You use the system but you don’t redistribute the core. In that case you can choose the license that you desire.
One concret example is for the R package ggplot2. This one uses R’s base functions like data.frame() or plot(), but it does not include their source code. It interacts with R through its public API, which is allowed without triggering GPL obligations. Therefore, ggplot2 is (and can be) licensed under MIT, even though R is under GPL, because it’s not redistributing or modifying R’s code.
The answer and the justification is subtle but make all the difference for your developments. You will find below a brief table to summaries the topic (table 4).
Table 4: Support for perimeter of R GPL licence obligation
Action
Redistributes R’s code?
License impact
Calling R/packages functions
No
No GPL obligation
Copying/modifying R source code
Yes
Must use GPLv2 or later
Bundling R with your software
Yes
Must comply with GPLv2 or later
Writing a package that runs on R
No
Free to choose license
8.6 Relicensing
This last point focusing on the relicensing of your package. Even if choose the best license for your purpose if better, sometime it’s possible that you need to changing it during your package lifecyle. If you are in this case, and it could happen even to the big, you could use these following step to support in this work:
first take a look into the Author field in the DESCRIPTION file and list all the contributors (with role cph),
if your package is hosted on a Git, take a look at the Git history or the contributors display (for GitHub). If you have done the thing right, you could also find this information in the NEWS.md file,
make a clean if necessary in the contributions and remove people who only contributed typo fixes and similar,
remove also people whom you have a CLA (Contributor License Agreement), because you don’t need their validation for the relicensing (you can inform them, of course, about the process),
the last step is to inform all the contributors about your wails to change the license. If you have Git connection with your package, you could create an issue and make a list ask them.
When you have all the validation, you can change your license.
8.7 Application for my package example
If you are here, that mean that you are an expert in license :)
Now we will just define a license for our R package example called “macfly”. After a very hard debate in my mind, I choose to use a GPL v3 license. Behind that choose I like the idea to keep all the future production associated with my package (and there will be because my package name contains the word awesome) free and open. In the discussion the MIT license was at the top of the list and I also think about the AGPL linked to the potential network use of my package (if I decided to add R Shiny for example), but we stay simple here. It’s a joke but it is the kind of discussion that you should have and could avoid pain if you add to change your license in the middle of the life of your package.
Focusing on my want, to define a license to my package, I’ve to launch the following command line in the working directory of my package.
✔ Setting active project to "C:/Users/mdepe/Documents/macfly".
✔ Adding "GPL (>= 3)" to 'License'.
✔ Writing 'LICENSE.md'.
✔ Adding "^LICENSE\\.md$" to '.Rbuildignore'.
In the output R console you should see something like the lines above:
in the case of my code, if you take a look into the DESCRIPTION file, you should an update of the field License with the mention GPL (>= 3). Specify argument include_future = TRUE create a name compatible with the future version of the license, a future version 4 of the GPL license will be compatible with the current selected version 3,
in the package directory you have the creation of the file LICENSE.md. This contains the full license text. Like we say before, this is an obligation to add to your software the full license text (remember it’s something specify in the license), but CRAN doesn’t allow this file when you check your package regarding CRAN standard (and in addition if you want to submit it on CRAN).
to correct that, the next added ^LICENSE\.md$ to the .Rbuildignore. We will know more about this file when we start to build our package, but the idea is to avoid specific elements when you build your package. Just to clarify, the text added is aregex expression which means at the start of the string if you find the exact word LICENCE.md and the end of the string after, avoid this file when you build my package.
1.
Wickham H, Bryan J (2023) R packages: Organize, test, document, and share your code (2nd edition). O’Reilly Media