Harvard Library Open Source Project Considerations

These are some factors to consider for any project that Harvard Library Technology Services would plan to release as open source.

Business factors

Business FactorsExplanationComments
Business strategyWhat are the business goals in releasing the software as open source? To share with the community? To gain product usage and feedback? To gain development partners? As a condition of a funding source? Other?This would likely be determined by the business owner in the Harvard Library.
Time to marketAre there external timing factors driving the project? Is it desirable to be first to market?Usually, being first is very important to gain mind share and market share, even if the initial product is not perfect.
Define minimum viable productFrom wikipedia: "A minimum viable product has just those core features that allow the product to be deployed, and no more. The product is typically deployed to a subset of possible customers, such as early adopters that are thought to be more forgiving, more likely to give feedback, and able to grasp a product vision from an early prototype or marketing information. It is a strategy targeted at avoiding building products that customers do not want, that seeks to maximize the information learned about the customer per dollar spent."The first release as open source may or may not be a MVP.

Software application factors

These are some software application factors to consider for any project that Harvard LTS would release as open source for re-use by others.

Basic FactorsExplanationComments
UsefulnessThere should be a targeted user for the software and the software should be useful, more or less "as is". Others should be able to install it and run it to at least some degree given the documentation that accompanies the code. See "minimum viable product."It must be useful without huge amounts of effort to have uptake.
InteroperabilityIf the software interoperates with other software tools, the open source project should have well documented, preferably standards based, interfaces to external code - web services, class interfaces, or otherIdeally, the software can interoperate with other successful open source projects.
LicenseThe software should be released with a license statement. GPL, LGPL, Apache, MIT, and BSD are common, in increasing order of permissiveness. Need to examine dependencies in the codebase and determine license requirements implied by other open source modules being used. AGPL v3.0 is an open source license which enables the use of other software with restrictive licenses.See Library Open Source License Best Practice recommendation
CopyrightEach source module should include a copyright statement 
Source controlGithub. The branching and merging strategy should be documented. There is a wiki plus issue tracking there.Github is the default at this time.
DocumentationThere should be design and operating documentation, including code organization. This could go on the github wiki. 
CodeSource file and class level documentation, at a minimum 
Issue trackingGithub. Do we mirror that work in JIRA for internal project management? 
Deployment packagingWe may or may not provide an executable version, or reference implementation. This could be a zip file, like FITS, or a Docker and/or VirtualBox image.Dataverse uses Vagrant to describe their reference implementation: http://guides.dataverse.org/en/latest/developers/tools.html#vagrant 

 

Community collaboration and governance

These are some factors to consider for any project that Harvard LTS open source project for which we will accept community input and contributions.

FactorExplanationComment
OwnershipFor example, we could require that contributors must provide contributed code under the same license as the project.Samvera has an institutional agreement that must be signed before code can be contributed
CommittersDefine clearly requirements to be a committer. It could be just Harvard if we want to maintain control, or we could approve others once they have demonstrated a sufficient level of trust.. 
Pull request policyUnder what conditions would we accept code contributions? Will we require a Contributor License Agreement? What would our policy be about doing anything with pull requests? 
Unit test coverage requirementsWhat testing would we require for pull requests to be accepted? What test framework would we support? 
Documentation requirementsWhat documentation would we require for submitted pull requests to be accepted? 
CommunicationEmail list/Google group/etc. Useful for communication with the community 

 

Background reading:

http://www.smashingmagazine.com/2013/01/starting-an-open-source-project/

and if you have a lot of time:

http://producingoss.com/en/

http://oss-watch.ac.uk/

Not perfect, but a few good points: http://www.wikihow.com/Have-a-Successful-Open-Source-Project

Stanford University Libraries are leaders in various open source library projects – including Blacklight and Samvera. A lot can be gleaned from looking at how they do things. For example, the governance and communication links from https://samvera.org/ and the “support” and “contributing” sections of https://github.com/projectblacklight/blacklight/wiki

We (LTS) have released a lot of software on GitHub as “open source” (https://github.com/harvard-library and https://github.com/harvard-lts and no doubt more.  But its currently more like a code dump, not an open source project - not well organized, documented, has no clear community model for how to get updates and to what extent we’ll accept pull requests, etc. We have a goal of developing an open source community process right now for FITS. It would be great to have a model that works for all our new projects.

iPres 2015 - Roles & Responsibilities for Sustaining Open Source Platforms & Tools

Grainne's notes on this workshop:

iPres 2015 - Roles & Responsibilities for Sustaining Open Source Platforms & Tools

iPres 2015 Open Source Workshop - brown bag presentation (powerpoint presentation)

 

OSS4Pres 2.0: Design Requirements for Better Open Source Tools

https://saaers.wordpress.com/2017/04/25/oss4pres-2-0-design-requirements-for-better-open-source-tools/

 

Licenses

Resources

http://opensource.org/licenses/

http://opensource.org/faq#copyleft

http://opensource.org/faq#permissive

http://choosealicense.com/licenses/

https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenses

 

 

Hydra project

There are three primary drivers for asking for CLA's:

1. it helps protect the project (and all its users) against copyright infringement lawsuits; with a CLA on file, we will be in a better position to defend against any claims that the Hydra code in question wasn't intended to be distributed as OSS. This is the primary reason that Apache and others require CLAs of contributors.

2. it helps protect contributors, by ensuring an explicit understanding of the nature of and intent of any contributions to the project. If a contributor (or his/her employer) hadn't fully considered the rights and implications of including their software in the project, the CLA makes the license as clear as possible.

3. it enables the project to relicense the software at future points; if/when Apache 3.0 were to emerge as a license, e.g., the project could reissue the entire codebase under the new license without having to contact each individual copyright holder and ask for permission (an untenable position for a code base as diverse and widespread as Hydra's). This is actually somewhat common, and something that many long-lived and successful projects have had to face in the changing legal environment.

It's important to note that it's possible to sign a CLA and be a code contributor without becoming an official Hydra Partner. So yes, while it will definitely add some overhead, we feel like it's an important step in laying a solid foundation for defensible and long-term use of a cohesively-licensed code base.

Google

one of the reasons that Google chooses the Apache License (2.0) as the default for the software it open-sources. It is permissive like BSD, but (unlike BSD) actually happens to mention the rights under copyright law and gives you a license under those rights. In other words, it actually knows what it is doing unlike some of the other permissive licenses.

DSpace

Licensing of Contributions

Any third-party libraries (e.g. JARs / Maven Dependencies) required to compile or run DSpace must be included. The license of any required jar/dependency MUST be compatible with BSD. It must not prevent any commercial use of DSpace, nor have any impact on the rest of the code by its inclusion. It is not acceptable to require additional downloads of JARs/dependencies to make DSpace compile or function.

Non-Java third-party web frameworks or tools (e.g. XSLT, CSS, Images) should follow these same licensing guidelines.

Examples of acceptable licenses:

  Apache License 2.0
  BSD
  Common Development and Distribution License (CDDL)
  Common Public License (CPL)
  GNU Library or "Lesser" General Public License (LGPL)
  MIT / X11 License
  Mozilla Public License
  Additional examples may be found in our LICENSES_THIRD_PARTY file in the source code. This file lists the licenses of all third-party libraries used by DSpace.

Examples of unacceptable licenses:

  GNU General Public License (GPL)
  GNU Affero General Public License (AGPL)
  European Union Public Licence (EUPL)
      Similar to GPL, this license is "share alike" / "strong copyleft" and may require usage of the same license for redistribution or derivatives. For more information, see the EUPL FAQ (specifically the questions "What about compatibility issues?" and "Are there limitations to the use of the software?")
  Any license which strictly forbids "sublicensing" as detailed at http://choosealicense.com/licenses/
  Any license which limits commercial use/redistribution of binary code

Why is GPL (and similar) unacceptable? Icon

DuraSpace feels it is important for commercial entities and service providers to be able to customize the entire codebase and redistribute/repackage/sell it in a binary form. GPL licenses prevent this, as noted in the following FAQ questions:

  Can I release a modified version of a GPL-covered program in binary form only?
  If I distribute GPL'd software for a fee, am I required to also make it available to the public without a charge?

In addition, the Apache Software Foundation has a good explanation of why they are also forced to avoid GPL-based (copyleft) licenses because of its one-way compatibility with Apache License 2.0:

“This licensing incompatibility applies only when some Apache project software becomes a derivative work of some GPLv3 software, because then the Apache software would have to be distributed under GPLv3.

We avoid GPLv3 software because merely linking to it is considered by the GPLv3 authors to create a derivative work. We want to honor their license. Unless GPLv3 licensors relax this interpretation of their own license regarding linking, our licensing philosophies are fundamentally incompatible. This is an identical issue for both GPLv2 and GPLv3.”

While DSpace is released under BSD licensing, the same issues exist between BSD licenses and GPL-based licenses.

JDBC drivers for databases are an exception since:

  They must correspond to the database version and not the DSpace version.
  They are not required for DSpace to compile and run; a variety of databases, including open source databases, may be used.