Cooperative Multiple Inheritance in Python: Theory (English)

Sunday, March 24, 2019

Cooperative Multiple Inheritance in Python: Theory (English)

$C^3$ Linearization Algorithm was first published at the 1996. It was adapted to the Open Dylan implementation in January 2012. It has been chosen as the default algorithm for method resolution in Python 2.3 (and newer).

Another name of this algorithm is MRO - Method Resolution Order.

Theoretically, not for any inheritance graph $C^3$ Linearization Algorithm can be linearized (see example in Appendix A). Linearization is more restricted version of topological sort of the graph. In practice, it is rarely happen, and if it did, Python will emit en Error.

That's right, the Diamond problem can be solved (practically). We can use inheritance where both object that has the state, and this will not cause any problem. No more virtual inheritance (as C++ did), or multiple interface inheritance (as Java did).

Note: We will examine below only so called new-style classes. All classes in Python 3 are new style. Everything listed here holds in Python 2 if there no classic style class involve in inheritance graph.

Algorithm
Suppose, you have class definition $class X(Y_1, Y_2,,..Y_N): pass$, then

(*) $L[X(Y_1, Y_2,...,Y_N)] = X + L[(Y_1,..,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$
$L(\emptyset)=\emptyset$, where $\emptyset$ denotes empty list, and + means list concatenation. Note: () also means empty list.

In English, Let $X$ be the class that extends classes $Y_1$, $Y_2$,...,$Y_N$. Then, linearization of class $X$ is (recursively) defined as list with head X and tail is merge procedure of the result of the (recursive call of) the algorithm on $Y_1$, $Y_2$,...,$Y_N$-these are first $N$ terms inside merge-following by $Y_1$, $Y_2$,...,$Y_N$.

The merge is done by selecting the first head of the lists which does not appear in the tail (all elements of a list except the first) of any of the lists. Note, that a good head may appear as the first element in multiple lists at the same time, but it is forbidden to appear anywhere else. The selected element is removed from all the lists where it appears as a head and appended to the output list. The process of selecting and removing a good head to extend the output list is repeated until all remaining lists are exhausted. If at some point no good head can be selected, because the heads of all remaining lists appear in any one tail of the lists, then the merge is impossible to compute due to inconsistent orderings of dependencies in the inheritance hierarchy and no linearization of the original class exists (abnormally).

There is more below.
Ниже есть продолжение.

Example 1

Based on https://www.python.org/download/releases/2.3/mro/

Suppose, we have following class hierarchy:

$class D(O): pass$

$class E(O): pass$

$class B(D, E): pass$

where $O$ denotes $object$.

We can visualize the inheritance graph as follows:

$L[B(D, E)]=B+merge(L[D], L[E], D, E)=B+merge(D+L[O], E+L[O], D, E]=$

$=B+merge(D+O+L[\emptyset], E+O+L[\emptyset], D, E]=$

$=B+merge(D+O+\emptyset, E+O+\emptyset, D, E]=$

$=B+merge(D+O, E+O, D, E)=B+D+merge(O, E+O, \emptyset, E)=$

We can't pull O, E+0 disagree.

$=B+D+E+merge(O,O, \emptyset, \emptyset)=B+D+E+O$

Note, that we did some computations twice. We can do it more efficiently.

Shorter way.
Let's make calculation by-level. Let's calculate $L[0], L[D], L[E], L[B]$.

$L[O]=O+L[()]=O+L[\emptyset]=O+\emptyset=O$
$L[D(O)]=D+L[O]=D+O+merge(L[())],())]=D+O+merge(\emptyset, \emptyset)=D+O+\emptyset=D+O$

that is $L[D(O)]=D+O.

$L[E[0]]=L[E(O)]=E+L[O]=E+O+merge(L[())],())]=E+O+merge(\emptyset,\emptyset=E+O$

that is $L[E[0]]=E+O$

$L[B(D, E)]=B+merge(L[D], L[E], D, E)=${applying calculation above}$=B+merge(D+O, E+O, D, E)=$

We see that D is a good head, therefore we take it.

$=B+D+merge(O,E+O,\emptyset,E)=$

Now O is not a good head, since it is in the tail of the sequence E+O. In this case, we have to skip to the next list. Then we see that E is a good head; we take it and we are reduced to compute $merge(O,O, \emptyset, \emptyset)$ which gives O.

$=B+D+E+merge(O,O, \emptyset, \emptyset)=B+D+E+O$.

Q.E.D

Now, when we have some filling how algorithm work, let's prove some propositions.

First of all, Let's rewrite (*) explicitly for cases that we have 0,1,2 immediate parents, $n=0,1,2$.

(1.0) $L[X]=X+L[()]=X+L[\emptyset]=X+\emptyset=X$
(1.1) $L[X(Y)]=X+L[Y]=X+merge(L(Y), Y]$
(1.2) $L[X(Y_1, Y_2)]=X+merge(L[Y_1], L[Y_2], Y_1, Y_2)$

Proposition 0.0
Suppose, you have class definition $class X(Y_1, Y_2,,..Y_N): pass$, then MRO of X will start from X.

Proof.
We should prove that $L[X]=X+...$

It directly follows from (*).

$L[X(Y_1, Y_2,...,Y_N)] = X + L[(Y_1,..,Y_N)]=X+...$

Q.E.D

$C^3$ linearization algorithm has following constraints:

0. The parents MRO's remain consistent: Inheritance graph determines the structure of method resolution order.
1. The local MRO's remain consistent, i.e., the super class will appear only after the method of the local classes will appear.
2. Monotonicity: An MRO is said to be monotonic if C1 precedes C2 in linearization of C, then C1 precedes C2 in the linearization of any subclass of C.

or rephrasing it more concisely:

a) Children precede their parents.
b) If a class inherits from multiple classes, they are kept in the order specified in the tuple of the base class.

See Appendix B for the prove how there properties follows from the $C^3$ linearization algorithm.

Proposition 1.0. (uniqueness)

1) If $C^3$ linearization algorithm produce linearization $X_1,...,X_n$,
than this order is unique.

Prove

Let's assume, that we have two different linearization lists: $X_1,...,X_n$ and $Y_1,...,Y_n$. Let denote by $X_i$ the last element where $X_j=Y_j$ (that is for $j<=i$ $X_j=Y_j$).

We have inheritance graph, this means that we have (finite) DAG. Applying formula (*) recursively we will travel on inheritance graph from childs to parent. By proposition (0.0) $X_1$=$Y_1$.

Let's look when $X_i$ was pulled out from merge (any other $X_i$, $Y_i$, i>1 was pulled out from merge). $X_1,...,X_n$ was produced by $C^3$ linearization algorithm, this means, it means that this merge procedure was resolved successfully, so after it was pulled out, there is no $X_j$ or $j<=i$ left in the merge procedure. So, when we do pulling out we have exactly $MRO_1$ of $X_i$ without any $X_j$ $j<i$.

$Y_1,...,Y_n=X_1,...,X_{i-1}, X_i,Y_{i+1},...,Y_n}$ was also produced by $C^3$ linearization algorithm. Let's look when $X_i$ was pulled out. $X_1,...,X_{i-1}, X_i,Y_{i+1},...,Y_n}$ was produced by $C^3$ linearization algorithm, this means, it means that this merge procedure was resolved successfully, so after it was pulled out, there is no $X_j$ or $j<=i$ left in the merge procedure. So, when we do pulling out we have exactly $MRO_2$ of $X_i$ without any $X_j$ $j<i$.

Because we have some finite graph, exists some $X_i_1$, $X_i_2$ that are listed in $MRO_1$ is some order and in $MRO_2$ in reverse order. Exists (first) merge stage when $X_i_1$ was pulled out by $MRO_1$, $X_i_2$ was pulled out by $MRO_2$.

Let's look on the merge structure:

$merge(L_1, L_2,...,L_N, L_(N+1}, L_(N+2},...,L_{2N})$

There are 3 different options, in each case we will have contradiction:

1) $X_i_1$, $X_i_2$ belong to some $L_j$.
2) $X_i_1$ is head of $L_j$, $X_i_1$ is head of $L_k$
3) $X_i_1$ is tail of $L_j$, $X_i_1$ is tail of $L_k$

1) $X_i_1$, $X_i_2$ belong to some $L_j$.

* if $X_i_1$ precedes $X_i_2$, than $X_i_2$ is in tail of $L_j$. So, we have contradiction with pulling out $X_i_2$ by $MRO_2$ (we can't pull out if exists tail with $X_i_2$).
* if $X_i_2$ precedes $X_i_1$, than $X_i_1$ is in tail of $L_j$. So, we have contradiction with pulling out $X_i_1$ by $MRO_1$ (we can't pull out if exists tail with $X_i_1$).

2) $X_i_1$ is head of $L_j$, $X_i_1$ is head of $L_k$

We can pull either $X_i_1$ or $X_i_2$ but not both (we're pulling out by the order of lists). We have contradiction.

3) $X_i_1$ is tail of $L_j$, $X_i_1$ is tail of $L_k$

We can't pull not $X_i_1$ not $X_i_2$ by merge procudure (ther're in the tail!). We have contradiction.

Q.E.D.

Proposition 2.0
`object` has to be last in (any) MRO.

Proof.
Let's assume, by contradiction, that exists some object X before object `object` in some MRO. This something (X) has `object` as (possibly indirect) parent. By b) parent's go after their children, so `object` has to go after X. Contradiction with our assumption. Q.E.D.

Proposition 2.1
If we have MRO of some node $A_1$, than we chose some node in that MRO, it's MRO will be subchain of $A_1$ in MRO.

Proof

Let G be some inheritance graph with at least one node. Let chose some node $A_1$. MRO of $A_1$ (if exists) will define some order on graph G

$L[A_1]=A_1,A_2...,A_n$

We have inheritance graph, this means that we have (finite) DAG. Applying formula (*) recursively we will travel on inheritance graph from childs to parent. For each node $A_i$ exists stage in the $C^3$ linearization algorithm in which exists merge procedure where $A_i$ is listed as list of single elements (DAG is connected graph, we're traveling from child to it's parent). Because, by assumption, $C^3$ linearization algorithm produce linearization $A_1,...,A_n$, it means that this merge procedure was resolved successfully, so our $A_i$ was pulled out on some stage (S), after it was pulled out, there is no $A_i$ left in the merge procedure. Each $A_j$, for $j<i$ was pulled out earlier, so stage (S) contain exactly MRO of $A_i$ written explicitly.

(As I show above it can't contain any $A_j$, for $j<i$. If the reminder will produce list that differ from MRO of $A_i$ this will means that exists $A_i_i$, $A_i_2$ that in MRO of $A_1$ are listed in one order and in MRO of $A_i$ in reverse order. This contradicts uniqenss of MRO (see proposition 1.0).

Q.E.D.

Proposition 2.2
Let G be some inheritance graph with at least one node. Let chose some node $A_1$. Let MRO of $A_1$ be $L[A_1].
If we define new class $B(A_1, A_2...A_n)$, that is class that inherits from n classes, then MRO of B will be $L[B]=B+A_1+...$.

Proof

We have inheritance graph, this means that we have (finite) DAG. Applying formula (*) recursively we will travel on inheritance graph from childs to parent.

$L[B(A_1, A_2,...,A_N)] = X + L[(A_1,..,A_N)]=X + merge(L[A_1], L[A_2],...,L[A_N], A_1, A_2,...,A_N)=X + merge(A_1+merge(...), L[A_2],...,L[A_N], A_1, A_2,...,A_N)$

We will prove that A_1 can be safely pull out from the merge. A_1 is head of first list, so this is the first candidate for pulling out. Last n element $A_1,...,A_N$ consist of 1 element, so they can't prevent from pulling out $A_1$. So, it is sufficient to prove that L[A_2],...,L[A_N] don't prevent pulling out.

Let's assume, by contradiction, that exists $i=2,..,N$ such that L[A_i] prevent for $A_1$ to be pulled out.

By property b) $A_1$ precceds $A_i$.

By merge procedure definition, this means, that tail of L[A_i] contains $A_1$. By constraint 0) this means that $A_i$ is subclass of $A_1$, so by property a) $A_i$ precceds $A_1$. This contradicts uniqenss of MRO ( proposition 1.0).

Q.E.D

Proposition 2.3 Let $n=1$ in Proposition 2.2 , than we get following proposition:

Let G be some inheritance graph with at least one node. Let chose some node $A_1$. Let MRO of $A_1$ be $L[A_1].
If we define new class $B(A_1)$, that is class that inherits from n classes, then MRO of B will be $L[B]=B+A_1+...$.

Q.E.D.

That is, we proved that if inherit from single class case $n=1$ in Proposition 2.2, the MRO will starts from B and than (single) parent class $A_1$.

Proposition 2.4 Let $n=2$ in Proposition 2.2 , than we get following proposition:
Let G be some inheritance graph with at least one node. Let chose some node $A_1$. Let MRO of $A_1$ be $L[A_1].
If we define new class $B(A_1, A_2)$, that is class that inherits from 2 classes, then MRO of B will be $L[B]=B+A_i+...$.

Q.E.D.

That is, we proved that if inherit from two classes, case $n=2$ in Proposition 2.2. the MRO will starts from B and then first class in the inheritance tuple $A_1$. There is no guarantee that the next class will be $A_2$, actually it can be it's sub-class, see Example 2 below.

Example 1-Indirect way

Based on https://www.python.org/download/releases/2.3/mro/

Suppose, we have following class hierarchy:

$class D(O): pass$

$class E(O): pass$

$class B(D, E): pass$

where $O$ denotes $`object`$.

We can visualize the inheritance graph as follows:

Now, lets find L[B(D, E)] without actual calculation.

By proposition (0.0) $L[B(D, E)]=B...$. By proposition (2.0) "the last class is `object` $L[B(D, E)]=...,`object`=O$. So, we know, that

$L[B(D, E)]=B,...,O$

We assume, that MRO exists, so we have 2 possibilities

1) $L[B(D, E)]=B, D, E, O
2) $L[B(D, E)]=B, E, D, O

(D, E) appears in tuple inherits of B. By) b) the order that they appears should be preserved, so D should go before E, so option 2) is impossible, that's why (we assumes that MRO exists and by proposition 1.0 it's unique) the answer is

$L[B(D, E)]=B, D, E, O

Q.E.D

Example 2.

Let's look on another example. Python have standard library class Dict. There is also OrderDict that extends Dict. Python have also Counter class. Counter inherit from Dict. It means, that it count have many time specific character appear in the string, but the order that it will print out the result is unspecified.

Let's define custom class

$class OrderedCounter(Counter, OrderDict): pass$

As you can see in the picture, we have diamond inheritance problem here.

We will look why this actually works in 2 different ways, by direct application of MRO algorithm and using only properties a) b).

Indirect approach

We will not calculate full MRO, but we will put enough constaint on it to establish that first object in the MRO of OrderedCounter is Counter and OrderDict goes before Dict.

So, we have to prove:

I. MRO of OrderedCounter starts from OrderCounter, Counter.
II. object has to be last in MRO.
III. OrderDict goes before Dict in MRO of OrderedCounter.

I. By proposition (0.0), MRO of OrderedCounter starts from OrderCounter. OrderCounter has 2 direct children Counter and OrderDict. By a) "children precede their parents" The first element in MRO has to be Counter or OrderDict. From b) "order specified in the inheritance tuple preserves" follows that Counter has to go before OrderDict, because they list in the inheritance tuple of OrderCounter. So, Counter has to first element in MRO.

II. By proposition (2.0)

III.
So, currently we established that

(#) L[OrderCount]=OrderCounter, Counter,...,object.

We have following options:

1) L[OrderCount]=OrderCounter, Counter, Dict, OrderDict, object
2) L[OrderCount]=OrderCounter, Counter, OrderDict, Dict, object

there is no other possible that will not contradict (#) and consists of objects of given inheritance graph. Let's write (1) in a bit explicit way:

1) L[OrderCount]=OrderCounter(Counter, OrderedDict), Counter(Dict), Dict(object), OrderDict(Dict), object

On one side, Dict is listed before OrderDict, on other side, we have OrderDict(Dict) and by a) "Children precede their parents" it follows that Dict should be listed before OrderDict. Contradiction.

Assuming that the given graph is linerizable (Python doesn't emit error), it has to be 2), that is

L[OrderCount]=OrderCounter, Counter, OrderDict, Dict, object

and OrderDict is called before Dict in MRO.

Q.E.D.

Direct MRO computation 1.

$L[OrderedCounter(Counter, OrderDict)]=OrderCounter+merge(L[Counter(Dict)], L[OrderDict(Dict)], Counter, OrderDict)=$

$=OrderCounter+merge(Counter+merge(L[Dict(Object)], Dict), OrderDict+merge(L[Dict(object)], Dict]), Counter, OrderDict)=$

$=OrderCounter+merge(Counter+merge(Dict+merge(L[Object], Object), Dict), OrderDict+merge(Dict+merge(L[object], object), Dict)), Counter, OrderDict)=$

$=OrderCounter+merge(Counter+merge(Dict+Object, Dict), OrderDict+merge(Dict+object, Dict)), Counter, OrderDict)=$

$=OrderCounter+merge(Counter+Dict+object, OrderDict+Dict+object, Counter, OrderDict)=$

$=OrderCounter+Counter+merge(Dict+object, OrderDict+Dict+object, \emptyset, OrderDict)=$

Note, we can't take Dict from the merge. $OrderDict+Dict+object$ don't agree.

$=OrderCounter+Counter+OrderDict+merge(Dict+object, Dict+object, \emptyset, \emptyset)=$

$=OrderCounter+Counter+OrderDict+Dict+merge(object, object\emptyset, \emptyset)=$

$=OrderCounter+Counter+OrderDict+Dict+object$

That is $L[OrderedCounter(Counter, OrderDict)]=OrderCounter+Counter+OrderDict+Dict+object$

Note, that OrderDict is listed before Dict in MRO.

Q.E.D.

Too complicate, let's try another way.

Direct MRO computation 2 (shorter way).

We will go level by level from root, downwards.

$L[object]=object$ by (1.0)
$L[Counter(Dict)]=Counter+merge(L[Dict(object)]=Counter+merge(Dict+merge(L[object])=Counter+merge(Dict+object)=Counter+Dict+object$
$L[OrderDict(Dict)]=OrderDict+merge(L[Dict(object)]=OrderDict+merge(Dict+merge(L[object])=OrderDict+merge(Dict+object)=OrderDict+Dict+object$

Then,

$L[OrderedCounter(Counter, OrderDict)]=OrderCounter+merge(L[Counter(Dict)], L[OrderDict(Dict)], Counter, OrderDict)=$

$=OrderCounter+merge(Counter+Dict+object, OrderDict+Dict+object, Counter, OrderDict)$

Note, we can't take Dict from the merge. $OrderDict+Dict+object$ don't agree.

$=OrderCounter+Counter+OrderDict+merge(Dict+object, Dict+object, \emptyset, \emptyset)=$

$=OrderCounter+Counter+OrderDict+Dict+merge(object, object\emptyset, \emptyset)=$

$=OrderCounter+Counter+OrderDict+Dict+object$

That is $L[OrderedCounter(Counter, OrderDict)]=OrderCounter+Counter+OrderDict+Dict+object$

Note, that OrderDict is listed before Dict in MRO.

As you can see, we did the same calculation, but such calculation order enables us not to make the same computation twice.

Q.E.D.

For more elaborate example see https://en.wikipedia.org/wiki/C3_linearization or here https://www.python.org/download/releases/2.3/mro/

****************

Appendix A - Example of correct inheritance graph that $C^3$ Linearization Algorithm fail to linearize.

$class X(object)$
$class Y(object)$
$class P(X, Y)$
$class Q(Y, X)$ (the problem is here, inconsistent orderings of parents with the line above)
$class Z(P, Q)$

Inheritance graph representation:

Note: This (finite) directly acyclic graph (DAG). So, it can be topologically sorted. But $C^3$ fails to linearize it. For example, Z+P+Q+X+Y is possible topologically sorted (but it violates properties a) "Children precede their parents" and b) "order specified in the inheritance order preserved" (we listed X before Y, but Q has inheritance tuple (Y, X)).

$L[X]=X$
$L[Y]=Y$
$L[P(X,Y)]=P+merge(L[X], L[Y], X, Y)=P+merge(X, Y, X, Y)=P+X+Y$
$L[Q(Y,X)]=Q+merge(L[Y], L[X], Y, X)=Q+merge(Y, X, Y, X)=Q+Y+X$
$L[Z(P,Q)]=Z+merge(L[P], L[Q], P, Q)=Z+merge(P+X+Y, Q+Y+X, P, Q)=$
$=Z+P+merge(X+Y, Q+Y+X, \emptyest, Q)=$

X can't be pulled out, Q+Y+X doesn't agree.

$=Z+P+Q+merge(X+Y, Y+X, \emptyest, \emptyest)=$

X can't be pulled out from first list, Y+X doens't agree.
Y can't be pulled out from second list,X+Y doens't agree.

We're stuck. No good head can be selected, because the heads of all remaining lists appear in any one tail of the lists. The merge is impossible to compute due to inconsistent orderings of dependencies in the inheritance hierarchy. According to the merge procedure, $C^3$ Linearization Algorithm stops with failure.

Q.E.D

***

Appendix B
prove that properties follows from the $C^3$ linearization algorithm.

$C^3$ linearization algorithm has following constraints:

(0.0) The parents MRO's remain consistent: Inheritance graph determines the structure of method resolution order.
(0.1) The local MRO's remain consistent, i.e., the super class will appear only after the method of the local classes will appear.
(0.2) Monotonicity: An MRO is said to be monotonic if C1 precedes C2 in linearization of C, then C1 precedes C2 in the linearization of any subclass of C.

or rephrasing it more concisely:

a) Children precede their parents.
b) If a class inherits from multiple classes, they are kept in the order specified in the tuple of the base class.

Prove
Inheritance graph means that we have (finite) DAG.

Let chose some node X. It's class definition is of form $class X(Y_1, Y_2,,..Y_N)$, then

(*) $L[X(Y_1, Y_2,...,Y_N)] = X + L[(Y_1,..,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$
$L(\emptyset)=\emptyset$.

where $Y_1, Y_2,...,Y_N$ are parents of X (this follows from definition of X).

The merge is done by selecting the first head of the lists which does not appear in the tail (all elements of a list except the first) of any of the lists. Note, that a good head may appear as the first element in multiple lists at the same time, but it is forbidden to appear anywhere else. The selected element is removed from all the lists where it appears as a head and appended to the output list. The process of selecting and removing a good head to extend the output list is repeated until all remaining lists are exhausted. If at some point no good head can be selected, because the heads of all remaining lists appear in any one tail of the lists, then the merge is impossible to compute due to inconsistent orderings of dependencies in the inheritance hierarchy and no linearization of the original class exists.

Applying formula (*) recursively we will travel on inheritance graph from childs to parent. So (0.0) holds.

Prove of (0.1) and a) We want to prove that in X precedes every node in L[X].

(*) $L[X(Y_1, Y_2,...,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$

Merge procedure ensures that when we pull out one element there, it doesn't remain in any list inside merge (we're pulling only from head when there is no tail that contain this class). Merge consists only from the lists of the parent's (immediate and distant) of X (if X has another children the algorithm will not visit them and X will be listed in the final list only once (we pull out X first, and merge doesn't contain X)). L[Y_1] is listed before Y_1, L[Y_2] is listed before Y_2,...,L[Y_n] is listed before Y_n, that ensure that we're pulling out node from merge we can't pull (immedate or distant) parent of $Y_i$ before we're pulling $Y_i$ ($Y_i$ will disagree; this is the reason we have merge not only with $L[Y_i]$, but also with $Y_i$). So X will be listed before any (immediate and distant) parent of X, on other words, we prove that X precedes every node in L[X].

Prove of (0.2) and b) We want to prove that in If a class inherits from multiple classes, they are kept in the order specified in the tuple of the base class

Let chose some node X. It is enough to prove that $Y_1, Y_2,...,Y_N$ preserves order (if X has another children the algorithm will not visit them and X is random).

Let's look on (*) again. X is listed first and he will not be pulled from merge (by a)). So it is enough to look on

(#1) $merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$

and prove that merge procedure ensure that $Y_1, Y_2,...,Y_N$ preserves order.

Note: This is the reason we list all $L[Y_i]$ before all $Y_i$.

Let's assume, by contradiction, that exists $Y_i_1$, $Y_i_2$, such that $i_1<i_2$, but $Y_i_2$ is pulled out before $Y_i_1$ in some merge procedure.

There are 3 cases:

1) $Y_i_1$ is disjoint to $Y_i_2$
2) $Y_i_1$ is parent of $Y_i_2$
3) $Y_i_2$ is parent of $Y_i_1$

1) $Y_i_1$ is disjoint to $Y_i_2$
$Y_i_1$ not in any L[Y_j], so $Y_i_1$ will be pulled out before $Y_i_2$. Contradiction.

2) $Y_i_1$ is parent of $Y_i_2$

$Y_i_2$ is in L[$Y_i_1$].

Contradiction with 0).

3) $Y_i_2$ is parent of $Y_i_1$
According to property a) $Y_i_1$ is pulled out before $Y_i_2$. Contradiction.

Q.E.D.

***

Appendix C - Time complexity, halting of algorithm, coorectness
Algorithm
Suppose, you have class definition $class X(Y_1, Y_2,,..Y_N): pass$, then

(*) $L[X(Y_1, Y_2,...,Y_N)] = X + L[(Y_1,..,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$
$L(\emptyset)=\emptyset$.

The merge is done by selecting the first head of the lists which does not appear in the tail (all elements of a list except the first) of any of the lists. Note, that a good head may appear as the first element in multiple lists at the same time, but it is forbidden to appear anywhere else. The selected element is removed from all the lists where it appears as a head and appended to the output list. The process of selecting and removing a good head to extend the output list is repeated until all remaining lists are exhausted. If at some point no good head can be selected, because the heads of all remaining lists appear in any one tail of the lists, then the merge is impossible to compute due to inconsistent orderings of dependencies in the inheritance hierarchy and no linearization of the original class exists.

Algorithm works only on (finite) directly acycled graph (DAG) G=(V,E), where V is notes of the graph, E - (directed) edges (from child to parent) of the graph. In DAG |E|=O(|V|). Let's denote |V|=C.

Lemma C.0
Each list in the merge procedure doesn't have repetitions.

Proof.
Let's assume, by contradiction, that there are at least one repetition. Applying formula (*) recursively we will travel on inheritance graph from childs to parent. So, if have repetition in list in the merge procedure, it means, that we hive cycle in the inheritance. But inheritance graph is DAG, so it contain no cycles.

Q.E.D.

Lemma C.1
List of all pulled out element from merge procedure doesn't have repetition.

Proof.
Let's assume, by contradiction, that there are at least 2 same elements $X$ that where pulled out from merge procedure. Let's look on first pulling out. When $X$ was pulled out, no more $X$ remain in the lists (if $X$ is the first element in the list, it is pulled out now, and if $X$ contain in some tail of any list, we can't pull it out by merge procedure. Because we did succeed in pulling, it means $X$ is not contain in any list in merge). So, we can't pull out $X$ again. Contradiction.

Q.E.D

Lemma C.2
Merge procedure have O(|V|) lists.

Proof.
Any node X can have O(|E|) classes in inheritance tuple. In the (finite) DAG O(|E|)=O(|V|). Merge list has O(|V|) element as number of the classes in inheritance tuple ($Y_1, Y_2,...,Y_N$) plus the same number of elements of there's MRO ($L[Y_1], L[Y_2],...,L[Y_N]$). In total its $O(|V|)+O(|V|)=O(|V|)$.

Q.E.D

Lemma C.3
If $C^3$ linearization algorithm produce linearization,
than each list in the merge procedure has O(|V|) elements.

Proof.
$C^3$ linearization algorithm is recursively described as
(*) $L[X(Y_1, Y_2,...,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$ for some node X (we're computing MRO of X)

$C^3$ linearization algorithm pulls out element $X$ from merge procedure is a such a way, that after pulling out, $X$ is not contained in any list in the merge (if it did, and algorithm produce linearization, that $X$ will be pulled out twice in contradiction with lemma C.1). Such "polling out" can be done at most as the size of the largest list inside merge (we can't pull from the $\emptyset$ empty list).

We're pulling out one element at a time, at total we're pulling out $O(|V|)$ elements, so the size of the largest list inside merge has to be $O(|V|)$.

Q.E.D.

Proposition C.1 (time complexity). Time complexity of the algorithm is $O(|V|^3)=O(C^3)$

Proof.

$C^3$ linearization algorithm is recursively described as
(*) $L[X(Y_1, Y_2,...,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$ for some node X (we're computing MRO of X)

Prepending X takes O(1). By Lemma C.2 merge have $O(|V|)$ lists. We take head of the first list $H$ and consults with tails of all other lists, number of them is $O(|V|)$. In the worst case, we can't pull out $H$, so we go to the next list and make same check. In the worst case, we will make $O(|V|)$ some checks.

By Lemma C.3 first list has in worst case (if algorithm is stucked, than we have a less work to do; we're checking that linearization can be done, during regular run of the algorithm) $O(|V|)$ elements, so in the worst case, the pulling out first list will take O(|V|)*O(|V|)=O(|V|^2). By Lemma C.2 merge have $O(|V|)$ lists, so it will take $O(|V|)*O(|V|^2)=O(|V|^3)=O(C^3)$ to empty all list in the worst case scenario.

Q.E.D.

Proposition C.2. (halting)
$C^3$ linearization algorithm halts.

Proof.
If there is some stage in which we can't pulled out new element from merge procedure, the algorithm is explicitly halts.

$L[\emptyset]=\emptyset$ by definition.

Let's assume, that merge procedure always succeeds. Than it produce some list $x_1,...,x_{k-1}, x_k$. By lemma C.1. such list doesn't have repetitions.

(*) $L[X(Y_1, Y_2,...,Y_N)] = X + merge(L[Y_1], L[Y_2],...,L[Y_N], Y_1, Y_2,...,Y_N)$

So, pulling out of each element, starting from the second one, corresponds to the pulling out element from merge.

Let's assume by math. induction, that $C^3$ linearization algorithm halts for all lists with $k-1$ elements.

For k=0, we have $L[\emptyset]=\emptyset$ that holds by definition.

For k=1, we have $L[X]=X+merge(L[\emptyset],\emptyset)=X+merge(\emptyset, \emptyset)=X+\emptyset=X$ that holds.

Let's look on pulling out $x_k$. By math. induction assumption, producing $x_1,...,x_{k-1}$ halts ($X=x_1+x_2+...x_{k-1}+x_k)$. By assumption, the merge produce succeeds (and it produces $x_k$). Appending $X_k$ to the list $x_1,...,x_{k-1}$ doesn't halt, so list producing list $x_1,...,x_{k-1}, x_k$ doesn't halt.

Q.E.D.

Note: As you can see in appendix A, not every inheritance graph has linearization.

Proposition C.3. (correctness)

1) If $C^3$ linearization algorithm produce linearization $X_1,...,X_n$,
than it's contain all elements of inheritance graph and, there is no repetition there.

2) Linearization that satisfies constraint 0.0)-0.2) exists iff (<=>) merge procedure can pull out any element at some stage of the $C^3$ linearization algorithm.

Proof.

2) <=) If merge procedure can pull out any element at every stage of the $C^3$ linearization algorithm than Linearization that satisfies constraint 0.0)-0.2). Inheritance graph means that we have (finite) DAG. Applying formula (*) recursively we will travel on inheritance graph from childs to parent. So 0.0) holds. The order in the list inside merge satisfies 0.1) "local MRO remain consistant" (see Appendix B, prove of 0.1) and a)). If merge procedure succeed in the pulling out element, this means that there is no tail in another list that forbids it's, that is property 0.2) "monotonicity" is satisifed. 2 =>) We should prove, if merge procedure can't pull out any element at some stage of the $C^3$ linearization algorithm than there is no linearizarion that satisfies constraint 0.0)-0.2).

Merge procedure can't pull out any element, if at some point no good head can be selected, because the heads of all remaining lists appear in any one tail of the lists, this means that on hand in order to satisfy 0.1) "local MRO remain consistant" we should pull out one of the head from the merge, on another hand this will break 0.2) "monotonicity" (see example in Appendix A).

1) We should prove, If $C^3$ linearization algorithm produce linearization $X_1,...,X_n$,
than it's contain all elements of inheritance graph and, there is no repetition there.

Prove

Inheritance graph means that we have (finite) DAG. Applying formula (*) recursively we will travel on inheritance graph from childs to parent. For each node $X_i$ exists stage in the $C^3$ linearization algorithm in which exists merge procedure where $X_i$ is listed as list of single elements (DAG is connected graph, we're traveling from child to it's parent). Because, by assumption, $C^3$ linearization algorithm produce linearization $X_1,...,X_n$, it means that this merge procedure was resolved successfully, so our $X_i$ was pulled out on some stage, so $C^3$ linearization algorithm produce linearization $X_1,...,X_n$ contain $X_i$.

Let's prove that there is no repetitions. By Lemma C.2 list of all pulled out element from merge procedure doesn't have repetition, the result of the $C^3$ linearization algorithm produce linearization $X_1,...,X_n$ is concatenation from recursive merge procedure and the first element where $C^3$ linearization algorithm starts. $X_1$ can't contain on the merge (merge works with his (direct or indirect parents only).

Q.E.D.