Generating SPDX files with licensecheck

This week I had to provide an SPDX file to a customer. SPDX seems to be a way to describe the licensing of software components, to help with open source compliance. Here is an official example (though it is probably not up to date with the current SPDX specification).

However, there are no open source tools to create or edit SPDX files, so I created a little openismus-spdx-generator.py Python script that uses Debian’s licensecheck utility to scan a project and then outputs a skeleton SPDX file in RDF format. It is a quick hack with no real error checking and I have barely read the SPDX specification, so please do improve it.

My first impression is that SPDX is rather unwieldy. The RDF (XML) format is verbose and seems to focus on being a snapshot of software via checksums of all its source files, rather than specifying a particular version or revision of the software as a whole. I don’t see any attempt to list dependencies and their licenses. More strangely, it looks like the .XLS (Microsoft Excel) spreadsheet format is the preferred format, which sets off my corporate-drones-doing-painfully-silly-things-that-they-believe-are-normal alarm bells.

There are official Java-based SPDX tools to convert between the various SPDX formats, and maybe to validate SPDX files. You’ll need to build them with the ruby-based buildr build system. Then you are left with some .jar files that have to be run via “java -jar target/whatever.jar the-spdx-file” after setting JAVA_HOME correctly.  Java programs are hard enough to package on Linux distros, but I’m sure that buildr makes it even less likely.

Anyway, the tools crash for me on the provided example files. The git repository has no branches or tags, so it’s hard to know which version is supposed to work and I don’t have confidence that the specification, example files, and tools are in sync with each other at any particular time.

Most of the SPDX file contents will be the result of a scan anyway, so rather than demanding that source code is supplied to me with an SPDX file, I’d generally prefer that the software just had proper COPYING files and source code headers. That seems like an easier requirement to comply with.

It’s all a bit linuxfoundationy.

10 thoughts on “Generating SPDX files with licensecheck

  1. Current version of the SPEC is at: https://spdx.org/content/spdx-specification. If you’ve got questions after reading it, please join us in #spdx on freenode.
    There are some crufty examples on the site left over from earlier versions, which definitely need to be cleaned up and updated to the latest spec, 1.1 – which was just published last week. So…tme for cleanup.

    The SPEC supports two official publication formats – text based tag/value (following DEP5 file style) and RDF. Translation between the two formats should be possible with the tool, and yes, for ease of working with some corporation legal and compliance folk we made sure it could be translated to a spreadsheet file. If you find a problem with the translation tools, please file a bug at: https://bugs.linuxfoundation.org/enter_bug.cgi?product=SPDX or better yet, contribute a patch to fix it ;) (Gary would welcome some help). The package section in the specification is the way to specify the overall license as a whole, the file level just provides evidence that the whole package has been scanned, and nothing has changed since the last time the package was analyzed. Problem being addressed is staleness of the COPYING file, where someone changes a file in the package (with a possibly different license), and the COPYING file doesn’t get updated.

    There are two efforts in progress to adapt current open source tools to generate out SPDX format already, FOSSology announced they’ll be creating an agent to output SPDX, as did the Ninka project in the press announce with the 1.1. More tools are welcome though! :)

  2. You may have caught us mid-update on the Java SPDX tools – there is now a tag for the 1.1 version in git and the current examples should work with the Java tools.

    In terms of using the tools, you can find executable jar files to download at spdx.org/tools (avoiding the need to build them yourself). I personally use the Eclipse based builders for building the software – the project meta data should be present in the source.

    If you run into any problems, please submit a bug at https://bugs.linuxfoundation.org/ under SPDX tools or send an email to the spdx-tech mailing list (spdx-tech@spdx.org).

    We are actively discussing the “unwieldy nature” of SPDX in the tech and legal teams. This may lead to a lighter option – stay tuned.

    Fair feedback on the difficulty of running Java tools. The executable jar files should help, but there is also some interest in developing Python based tools and perhaps a web based service to convert and pretty print SPDX files. These efforts are limited by resources available, so contributions are certainly welcome.

    Gary

    1. > You may have caught us mid-update on the Java SPDX tools – there is now a tag for the 1.1 version in gi

      Are you sure? I don’t see any tags, either here:
      http://git.spdx.org/?p=spdx-tools.git;a=summary
      or with “git tag”

      > The executable jar files should help, but there is also some interest in developing Python based tools and perhaps a web based service to convert and pretty print SPDX files.

      I’d probably set up a quick GWT-based web service on AWS for you if you were using maven and publishing in a maven repository, but that’s my personal preference.

  3. Thanks.

    > The SPEC supports two official publication formats ? text based tag/value (following DEP5 file style) and RDF.

    > The package section in the specification is the way to specify the overall license as a whole, the file level just provides evidence that the whole package has been scanned

    The website could really do with having this introductory information right there on the front page, rather than just having links to the specification or other PDFs.

    > https://bugs.linuxfoundation.org/enter_bug.cgi?product=SPDX

    You should really use that link here instead of the generic advice to use bugs.linuxfoundation.org:
    https://spdx.org/content/tools
    It’s much less annoying.

  4. The tag is now in the Linux foundation git repository (forgot to push the tag update).

    I also have been looking into AWS. I played around with a servlet based solution and got the AWS server setup to host it, but had to put the work on hold to finish up the 1.1 updates. I’ll add pushing to a Maven repository to my list of future things to do. This would definitely help other developers using the SPDX libraries.

  5. I?d generally prefer that the software just had proper COPYING files and source code headers.

    Unfortunately, that really doesn’t scale. It is quite common now in even fairly trivial Android or website deployments to talk about 100s of packages, and it’s really quite silly, error-prone, and inefficient to be relying on manual parsing of completely non-standardized data when you need to understand your licensing situation.

    [Which is not to say that SPDX as currently constituted is the answer. But I can say with great confidence that COPYING and source code headers definitely aren’t the answer – to deal with the huge size of the systems that we’re now building out of open code, we really need better, standardized data formats and tools.]

    1. Yes, I mean that I’d rather demand that packages have COPYING files than that they have SPDX files, which would just be generated on the basis of the COPYING files anyway. I guess people want SPDX files for packages (dependencies) that they have no control over, but I am not convinced yet of how this will really be used.

      1. And when you just want a more structured license identification, instead of relying on recognition of the COPYING text, I’d rather state is simply in the .doap file, which all GNOME projects must now have (though I don’t think the license line is demanded yet):
        http://en.wikipedia.org/wiki/Description_of_a_Project

        Presumably all this has been discussed already by more knowledgeable people, but the reasoning should be on the website.

Leave a Reply

Your email address will not be published.