In my recent work I’ve encountered the following problem:
There is a table TASKS having the columns as (each column has the type NUMBER):
The ID is a primary key on the table.
The OPERATION_ID groups the task into operations.
The CREATION_ORDER defines a partial-order-relation on the TASKS for each OPERATION_ID (if the values for two records are equals they can be executed concurrently).
The SUB_OPERATION_ID works as an “include” and refers to the OPERATION_ID of this same table (here is the recursion! :-)).
The TASK_TYPE_ID is a foreign key to the to-be-executed task description.
The last two fields are excluding each other! (In other words: there is EXACTLY one of them being not null!)
For a given OPERATION_ID, I have to give back the graph of tasks in the form:
where the first two columns refer to the ID field of the TASKS table. (PARENT_ID is NULL for the root tasks).
Restrictions and other information:
1. The number of rows affected in the query is a few dozens (not more than 100).
2. The whole table contains a few thousands of rows.
3. The algorithm could work in parallel and distributed environment.
4. Time is not essential while it is not more than a few seconds.
5. I cannot use tables for storing temporary data.
Take this example:
ID OP_ID CR_OR SUBO TASK
1 1 0 1
2 1 1 2
3 1 1 3
4 1 2 2
5 2 0 3
6 2 1 4
7 3 0 4
8 4 0 5
The result should be:
PAR ID TASK
1 5 3
1 7 4
5 8 5
7 4 2
8 4 2
1. In the result, the parent not always the immediate ancestor in the original database, because all the sub-operation calls removed from the graph. For example, the 1-2-5-7-8-4 is reduced to 1-5-8-4.
2. It is not a tree. There could be more than one parent of any node, so I could not take the parent by a stored function.
I tried a dozen of ways, using START WITH ... CONNECT BY, but either received wrong result, exceeded the time threshold or bumped into SQL restrictions.
Thank you for any idea,