Repository Analyzer improvements

I’ve done some more work on the debian Repository Analyzer that I mentioned before, which can help companies with license compliance by helping them to discover the licenses of their debian-based product’s software, and to navigate around the dependencies with that information.

Now it identifies some standard free-software and open-source licenses (such as GPL, LGPL, MIT, X11, Boost, etc), though you’ll have to sanity-check the results, because there are several ways that it can automatically guess wrong when looking at the tarballs and diffs. There are Open Tarball and Open Diff buttons so you can take a quick look at the files for yourself.

It still ends up with lots of apparently-unique licences – for instance, there seem to be quite a few minor variations of the BSD, MIT, and X11 licenses. It needs a human to decide whether they are really equivalent, but I added a Remove As Duplicate button to let a human do that. That was a nice exercise of Glom’s pygda and pygtk support in Glom (This requires Glom 1.2).

There’s a lot of bugfixes too. In particular, it now does a better job of handling license files in various text encodings. This is still non-expert Python code, so I’d welcome any cleanup patches.