Join Relations

  1. Go to the task repository (if you are not already logged in, use your LDAP account to log in).
  2. Click on the fork button.
  3. Clone your newly forked repository using git.
  4. Make changes to the code. After every pushed commit the code will be re-evaluated and the best submission with the lowest overall time will appear on the leaderboard.
Go ahead and submit the sample solution to get into the leaderboard.
Task id is JoinRelations.
Start: 15. 10. 2018 02:00:00
End: 14. 12. 2018 12:00:00

The task is currently not active.

Problem introduction

The task is to evaluate join queries on a set of pre-defined relations. Each join query specifies a pair of relations, (equality) join predicate and aggregation of selected columns. The challenge is to execute the join queries as fast as possible. The input to your program will be provided on the standard input, and the output must appear on the standard output.

You can use any of the supported languages (, however we recommend C or C++ for greater control over your program's performance.

Important dates
Start of the competition:15.10.2018
The first lecture:16.10.2018, 12:30, lecture room B4
Final submission deadline:14.12.2018, 12:00 CET


The fastest submission of each student received before the deadline will be evaluated by measuring the overall processing time on additional private datasets. Prices according to the final standings will as follows:

  1. 8000 kč for the student with the best time.
  2. 4000 kč for the runner-up.
Input format

All input to your program will be sent to its standard input and all output should be sent to the standard output.

Your program will first receive multiple lines, each line will contain a filename of a relation (lines are delimited with the line feed character '\n'). After the relations your program will receive the string 'Done'. The relation files are in a binary format and thus do not require manual parsing. Our sample solution already contains code that loads the relations into memory. The binary format of a relation consists of a header and a data section. The header contains two numbers: (1) the number of tuples (rows) and (2) the number of columns. The data section then stores all tuples using column storage. Hence, all of the values of one column are stored sequentially one after another, followed by the values of the next column, and so on:

struct Relation {
  uint64_t rows;
  uint64_t columns;
  uint64_t data[1];

Your program will then receive a set of join queries (each line represents a query). The join queries use relation-column pairs written as Rx.Cy, which refers to a column Cy of a relation Rx (indices start at 0). A join query consists of two parts (separated by the pipe symbol '|'):

  1. Join predicates - The join predicates specify on which columns the relations should be joined. A join predicate contains two relation-column pairs connected with an equality ('=') operator.
  2. Projections - Contains a list of columns that are needed to compute the final checksum that we use to verify that the join was done properly. Similar to the join predicates, columns are denoted as relation-column pairs. Each selection is delimited by a space character (' ').

The following join query

0.1=1.2|0.0 1.1 1.2

can be translated to this SQL snippet:


The end of a batch is indicated by a line containing the character 'F'. Our test harness will then wait for the results to be written to your program's standard output. For each join query, your program is required to output a line containing the checksums of the individual projections separated by spaces (e.g., "42 4711"). If there is no qualifying tuple, each checksum should return "NULL" like in SQL.
For your checksums, you do not have to worry about numeric overflows as long as you are using 64-bit unsigned integers.
Your solution will be evaluated for correctness and execution time.


Input to your program:

0.1=1.1|0.0 1.0
0.0=1.0|0.0 1.0

Relation r0 (here in text form for clarity):

4 1
5 2
6 7
8 6

Relation r1 (here in text form for clarity):

2 1
3 2
3 1
9 8

Expected output:

13 8
Name Timeout (seconds) Best time (seconds)
Small 60 0.001
Medium 60 1.334
Large 60 1.726
Submission count: 74