Murray's Blog

SQL: joins and duplicates.

Here’s a little SQL problem for the lazy web. It’s something that I’m trying to implement in Glom for the Repository Analyzer:

Let’s say we have these tables:


package_name package_description
something something description
somethingelse something else description
somethingmore something more description

and Package Scans:

package_name version license_id
something 0.1 43
something 0.2 43
something 0.3 44
somethingelse 1.5 43
somethingmore 0.9 40

Now, I want to get the package.package_description for all packages that appear in package_scans with license_id 43, which would look like this:

‘something description’
‘something else description’

The best I can do so far is a SELECT on package_scans, doing a LEFT OUTER JOIN:

SELECT “relationship_package”.”package_description” FROM “package_scans” LEFT OUTER JOIN “packages” AS “relationship_package” ON (“package_scans”.”package_name” = “relationship_package”.”package_name”) WHERE “package_scans”.”license_id” = 43;

which gives me duplicates, like so:

‘something description’
‘something description’
‘something else description’

If possible I’d like to do this without GROUP BY. I feel there must be a simpler way to say “give me a row for each record in packages for which the (indirect) relationship is true”. If the main FROM table could somehow be packages, instead of package_scans, then the LEFT OUTER JOIN would cause me to have only one row for each relevant packages record. In general, Glom never gives you repeat rows because that’s confusing.

If I can figure out what SQL should be generated, I could imagine that I might get that result in Glom by defining a relationship in terms of a doubly-related field. So the user could say “Show me records from the packages_with_package_scan_license_id” relationship. That relationship would be defined as something like

records from packages where licenses.license_id == packages::package_scans::license_id

But maybe a GROUP BY option really is the clearest.

Update: I feel like the sub-select idea might be what I want, if I can figure the syntax out. I like the idea of showing records from a relationship that is itself defined by a link between a value and a field in another related set of records.

Exit mobile version