Denormalisation aggregate function for Postgresql

A handy function that can be used to build vectors from tables.

Since posting about Vector Denormalisation in PostgreSQL, I’ve continued to experiment with improvements. (And I’m still working on the followup article to that one.)

When a denormalised table has been created, it is necessary to populate it with incoming data. Initially I did this by writing a loader function in PL/pgSQL. The function would loop over the rows in an import table, and construct the appropriate vectors for the denormalised table. The loader combines — aggregates, even — several normalised values into a single denormalised value. The input values are grouped by various criteria.

What if there was an aggregate function that could do this? One that would fit in a query, working seamlessly with SQL grouping constructs?

CREATE OR REPLACE FUNCTION denorm(ANYARRAY, INTEGER, ANYELEMENT) RETURNS ANYARRAY
IMMUTABLE
LANGUAGE 'plpgsql'
AS $$BEGIN
    $1[$2] = $3;
    return $1;
END;$$;

CREATE AGGREGATE denorm(INTEGER, ANYELEMENT)
(
    SFUNC = denorm,
    STYPE = ANYARRAY,
    INITCOND = '{}'
);

It’s surprisingly simple; it’s almost as if the designers of PostgreSQL anticipated my use case.

denorm(index, value) is a polymorphic aggregate; that means the types of the input values and of the output array are not specified. When it is used, PostgreSQL will attempt to infer the types from the context (typically, the inputs to the aggregate) and fill them in as part of the definition.

My first version was for DOUBLE PRECISION values only, and defaulted to a NULL result for empty input. When making it polymorphic I had to specify an INITCOND (initial condition), because otherwise PostgreSQL was unable to determine the type of the array to create for the first row. This seems reasonable since the type information must propagate through the parser, planner, aggregate-running and pl/pgSQL systems. In any case, I was unable to think of any cases where it really matters whether the default result is NULL or the empty array.

As an example of its use, let me one again denormalise my hardware sensor log:

CREATE TABLE hardware.hlog AS
SELECT time, denorm(sensor_id, value) FROM hardware.log GROUP BY time ORDER BY time;
SELECT 1025029
Time: 144165.408 ms

Seems fast enough. A quick sanity check:

SELECT * FROM log WHERE time BETWEEN '2011-09-21 15:05:30' AND '2011-09-21 15:05:30' AND sensor_id BETWEEN 11 AND 13;
        time         | sensor_id |  value
---------------------+-----------+----------
 2011-09-21 15:05:30 |        11 |       25
 2011-09-21 15:05:30 |        12 | 3947.368
 2011-09-21 15:05:30 |        13 | 200.6738
(3 rows)
SELECT time, values[11:13] FROM hlog WHERE time BETWEEN '2011-09-21 15:05:30' AND '2011-09-21 15:05:30';
        time         |         values
---------------------+------------------------
 2011-09-21 15:05:30 | {25,3947.368,200.6738}
(1 row)

The full definition and accompanying scant documentation is at http://code.google.com/p/ejrh/source/browse/trunk/utils/denorm.sql.

This entry was posted in Programming and tagged . Bookmark the permalink.

1 Response to Denormalisation aggregate function for Postgresql

  1. Pingback: Vector re-normalisation with views in PostgreSQL | EJRH

Leave a comment