Perl Weekly Challenge 132: Hash Join

Mapping instructions for a relation database to a language like Perl is never easy. But these instructions are particular tricky, mostly because of this:

If the size of the hash table equals the maximum in-memory size

In Perl, when you reach the maximum in-memory size, you have consumed all the memory the OS is willing to give you. Perls solution to a request for more memory if the OS isn't willing to give you more is to crash (which cannot be trapped). You will not have any memory left to do something useful.

Note that dealing with running out of memory is part of the challenge. We're not asked to implement the standard hash join algorithm. Right above the quoted algorithm, Wikipedia writes:

This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:

So we really do have to create an algorithm which deals with running out of memory. (And this begs the question "which piss poor hardware is the weekly challenge running that such a small example runs out of memory?").

Now, there is a way out of this. If you building perl with the -DPERL_EMERGENCY_SBRK compilation option, and if you are building perl so it uses its own malloc (which it doesn't do by default on most platforms), then you can allocate some emergy memory using $^M.

This gives us a trappeble "running out of memory" event, but it's very unlikely the program will be left in a state we can continue. So, whatever algorithm we come up with, it will be extremely flimsy, and unlikely to work. Given that, what is presented below is untested; we could not be bothered to recompile perl.

Solution

Perl

First, we check whether the program is run by a perl which can actually trap an out of memory event. If not, that's it. Else, we allocate some emergency memory. (We're using 1 Mb of emergy memory. No idea whether that is realistic or not).

We now create a __DIE__ handler, to trap an out of memory event. In that case, we call a subroutine called flush.

The main loop of the program populates the output structure %output. This implements step 1.1 of the algorithm. The hash %output uses the ages as keys; each value is an array of first names.

If we are running out of memory, flush is called. It processes the %output structure, and sends results to standard output. To get the memory needed, we undefine $^M, do our thing, release the memory used by %output, and set $^M again. This implements step 1.2 of the algorithm above.

As a final step, after the main loop, we have to call flush again. This is step 2 of the algorithm above:

Perl Weekly Challenge 132: Hash Join

Challenge

Example

Discussion

Solution

Perl