Communication Complexity Introduction

Communication Complexity
Introduction:
Motivated by VLSI applications, research in communication complexity has so far mainly focused on lower bounds for protocols computing specific functions.

In this paper we take a look at communication complexity from the point of view of (“machine based”) complexity theory.
We find a rich .structure of natural complexity classes, providing a structured framework for the classification of various concrete functions, by introducing notions of reducibility and highlighting complete problems in different classes. This structure may occasionally serve as a guide to finding lower bounds of significance to VLSI, although this should not be the primary objective of this theory.
An example is given in Corollary 9.6; the recognition that simple graph problems such as connectedness
between a pair of points are PSPACEcc-hard has lead to an O(n) lower bound for the bounded-error
probabilistic complexity of these problems.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

The primary goal, however, is to gain insight into the nature of alternation, counting and probabilistic complexity in a context where the chances of progress might be greater than for the analogous questions in Turing machine complexity.

In the basic model, introduced by Yao Yal, two communicating parties (North and South) want to cooperatively determine the value f(x, y) of a Boolean function f in 2n variables. Both North and South have complete information about f and unlimited computational power but receive only half of the input (x and y, reap.) (x, y E {O, 1 }R). They exchange bits according to some protocol until one of them (South) declares the value of f(x, y). The objective is to minimize the number of bits exchanged.

This model and its bounded-error probabilistic version have proved a useful tool in obtaining area-time
tradeoff’s for VLSI computation (Th, Ya2). Non-deterministic protocols were introduced by Lipton and Sedgewick LS, mainly because it was apparent that the known lower bound techniques for deterministic protocols worked for nondeterministic ones as well. (Although the matrix rank lower bound used by Mehlhorn and Schmidt MS applies only to deterministic protocols, most lower bound proofs do work for nondeterministic protocols.) This phenomenon was explained by the surprising result of Aho, Ullman and Yannakakis AUY that if both f and its negation have nondeterministic protocols of length t then f has a deterministic protocol of length O(t2 ). As pointed out by Papadimitriou and Sipser PSr, this result is analogous to a P = NP n coNP statement, and is the starting point of the present work. The statement cor-responding to P f:. NP is comparatively straightforward: checking the relation z f:. y requires n bits information transfer deterministically Val and only log n nondeterministic ally. (In PSr, the same exponential speedup is proved in the more difficult model where the 2n input bits are divided between North and South “optimally”. In this model equality becomes trivial to test deterministically; however, an exponential gap remains for the function “triangle-free graph”. We shall not consider problems of this model here.) Bounded-error probabilistic protocols were introduced by Yao Ya3. They can be exponentially more powerful than deterministic or even nondeterministic protocols: “equality” can be tested with high probability at a cost of only O(log n) bits of communication (Yal, Ra, JKS, cf. also MS). The other side of the comparison problem between the powers of nondeterministic vs. bounded-error probabilistic protocols will be resolved in this paper: we prove that nondeterminism can be exponentially more powerful than bounded-error probabilistic protocols. Such lower bounds are considerably more difficult to prove than deterministic lower bounds and have obvious potential significance for VLSI. Indeed, nontrivial lower bounds for a wide variety of graph properties follow immediately.

Unbounded-error probabilistic protocols were introduced by Paturi and Simon PSn. Lower bound proofs in this model apparently require deeper mathematical tools such as those employed by Alon, Frankl and Rodl AFR in their proof that host every Boolean function requires at least n – 5 bits communication. It remains an open problem to exhibit even a single explicit example of a Boolean function requiring more than log n bits. (By log n
we mean base 2 logarithm throughout.) It is conjectured that “inner product mod 2” (E~=l ZiYi mod 2) requires n -1 bits.

Complexity classes
Judging from what is commonly deemed “easy” and “difficult”, the natural unit by which to measure communication complexity is the quantity p = log n. “Time” being the number of bits exchanged, we shall thus speak
of a “polynomial time protocol” if at most pC = (log n)C
bits are transferred for some constant c. We can now define the analogues of P, NP, BPP and PP as those classes of languages (having words of even lengths only) admitting polynomial time deterministic, nondeterminism-tic, bounded-error and unbounded-error probabilistic protocols, resp. Note that in our definition, these classes are “non-uniform”: to recognize a language L, each level L n = L n {O, 1 }2n has to be recognized by a. separate protocol and this sequence of protocols may have no common pattern. ({ 0, 1 }2n is identified with {O, 1 }n x { 0, l}B and L n is identified with the Boolean function
f : {O, l}nx{O, l}n–+{O, I} where f(z, y) = 1 iff(x, y) E Ln.)
As we have done in the Abstract, we might denote these classes by pee, Npcc , etc. For convenience (and added thrill) we shall omit the ~c superscript and (ab)use ~he familiar notation to obtain such impressive statements as NP i- coNP. Throughout this paper, such notation will refer to communication complexity classes unless TM’s are specifically mentioned. Thus P = NP n coNP AUY. Again the analysis of “equality” lYall, Ra shows that P f:. BPP (and, consequently, BPP ~ NP). One of our main results is NP g; BPP. Care has to be exercised in designating the analog of PP. We denote the class de-fined. by polynomial time Paturi..Simon PSn unbounded error probabilistic protocols by UPP and reserve the symbol PP for a different class. We shall define PSPACE and find that PP is a subclass of PSPACE while UPP is likely to be incomparable with PSPACE.

3. Alteration
For deterministic protocols, Duril, Galil and Schnitger DGS proved that the nUlnber of rounds in communication defines a strict hierarchy with exponential increase in power at each step.

We propose the notion of another hierarchy, one that corresponds to alternating Turing machines. The resulting complexity classes will be the Dlembers of the polynomial time hierarchy in communication complexity, de-noted Ek and ilk, k = 0, 1, …. “Unlimited” alternation
will define PSPACE (although we do not have a notion corresponding to space). Separation of these classes is a major open problem.

In this section we give a definition, in terms of protocols, of the classes in the hierarchy. The equivalent characterization in terms of rectangle-based formulas to be given in the’ next section is so natural, however (at least once one digested the notion that rectangles are the quintessence of communication complexity theory) that the reader may consider skipping this section and view the results of the next one as definitions.

As before, in an alternating communication protocol, North and South cooperatively try to evaluate fez, y) where z is known to North, g to South, the Boolean function f is fully known to both and both have unlimited computational power. The difference is that now, North and South are just pawns (or, rather, referees) in an antagonistic game of two much more powerful players, East and West. (Once again, the fate of North-South dialogue depends on the outsole of E~t- West conflict. This anal-ogy doesn’t run very deep, though.) East and West are nondeterministic. In a previously agreed number of alternate moves they write (0,1)-strings of previously agreed lengths on a board viewed by both North and South. After the East-West game, North and South exchange messages according to a deterministic protocol until one of them declares the winner (East or West). This way the Boolean function f is computed if fez, g) = 1 precisely when East has a winning strategy. (East is the existential, West is the universal player.)
Fundamentals
Deterministic Protocols
Rank
Randomized Protocols
Numbers On Foreheads
Discrepancy
Compressing Communication
Information
Circuits and Proofs
Memory Size
Data Structures
Extension Complexity of Polytopes
Distributed Computing
Deterministic Protocols
To understand the mathematics of communication, we need to give a rigorous definition that specifies what a conversation between people really is, and say how we can measure its complexity. However,
the concept of communication is so fundamental to humans that we can appreciate the concept even before a rigorous definition is in place.

With that in mind, we begin by discussing some interesting examples of communication problems. This will set the stage for the rigorous discussion later. In the sections that follow, we provide mathematical definitions of communication protocols and related concepts, and prove some basic facts about them.

Some Examples of Communication Problems and Protocols
A communication protocol specifies a way for a set of people to have a conversation. Each person has access to a different source of information.

Their goal is to learn some feature of all the information that they collectively know.

Equality Suppose two people named Alice and Bob are given two n-bit strings.
Alice is given x and Bob is given y, and they want to know if x = y. There is a trivial solution: Alice can send her input x to Bob, and Bob can let her know if x = y.
This is a deterministic1 protocol that takes n + 1 bits of communication. Interestingly, we shall prove that no deterministic protocol is more efficient.
On the other hand, for every number k, there is a randomized protocol that uses only k +1 bits of communication and errs with probability at most 2?k: the parties can use randomness to hash their inputs and compare the hashes.

Cliques and Independent Sets Here Alice and Bob are given a graph G on n vertices. In addition, Alice knows a clique C in the graph, and Bob knows an independent set I in the graph. They want to know whether C and I share a common vertex or not, and they want to know this using a short conversation. Describing C or I takes about n bits, because in general the graph may have 2n cliques or 2n independent sets. So, if Alice and Bob try to tell each
other what C or I is, that will lead to a very long conversation.
Here we discuss a clever interactive protocol allowing Alice and Bob to have an extremely short conversation for this task.

They will send at most O(log2 n) bits. If C contains a vertex v of degree less than n/2, Alice sends Bob the name of v. This takes just O(log n) bits of communication. Now, either v 2 I, or Alice and Bob can safely discard all the non-neighbors of v, since these cannot be a part of A. This eliminates at least n/2 vertices from the graph.
Similarly, if I contains a vertex v of degree at least n/2, Bob sends Alice the name of v. Again, either v 2 C, or Alice and Bob can safely delete all the neighbors of v from the graph, which eliminates about n/2 vertices. If all the vertices in C have degree more than n/2, and all the vertices in I have degree less than n/2, then C and I do not share a vertex. The conversation can safely terminate.
So, in each round of communication, either the parties know that C I = Æ, or the number of vertices is reduced by
a factor of 2. After k rounds, the number of vertices is at most n/2k. If k exceeds log n, the number of vertices left will be less than 1, and Alice and Bob will known if C and I share a vertex or not. This means that at most log n vertices can be announced before the protocol ends, proving that at most O(log2 n) bits will be exchanged before Alice and Bob learn what they wanted to know.

Rank
Matrices give a powerful way to represent mathematical objects. Once an object is encoded as a matrix, the many tools of linear algebra can be used to understand properties of the object. This broad approach has been very fruitful in many areas of mathematics, and we shall use it to understand communication complexity as well.

We can represent a function g : X _ Y ! f0, 1g by an m _ n matrix M, where m = jXj is the number of rows and n = jYj is the number of columns, and the (i, j)’th entry is Mij = g(i, j). If we interpret the inputs to the parties as unit column vectors ei, ej, we have g(i, j) = e| i Mej.

The rank of a matrix is the maximum size of a set of linearly independent rows in the matrix. We write rank(M) to denote the rank of a matrix M. The driving question of this chapter is:
How does the rank relate to communication complexity?
Rank has many other useful interpretations, that may seem different
at first:
Fact 2.1. For an m _ n matrix M, the following are equivalent:
• rank(M) = r.

• r is the smallest number such that M can be expressed as M = AB,
where A is an m _ r matrix, and B is an r _ n matrix.

• r is the smallest number such that M can be expressed as the sum of r matrices of rank 1.

• r is the largest number such that M has r linearly independent columns.

The second characterizations of rank discussed above suggests that the rank of a matrix is closely related to communication complexity:
Alice and Bob can use a factorization M = AB, where A is an m _
r matrix and B is an r _ n, to get a protocol for computing g. To compute g(i, j) = e|
i Mej = e|
i ABej, Alice can send Bob e|
i A, and then Bob can multiply this vector with Bej. This involves transmitting a vector of at most r numbers. This is not quite an efficient protocol, because each of the numbers in the vector may require lots of bits to encode.

Randomized protocols
Access to randomness is an enabling feature in many computational processes. Randomized algorithms are often easier to understand and more elegant than their deterministic counterparts. In this chapter, we show how randomization can be used to give protocols that are far more efficient than the best deterministic protocols for many problems.

We start by giving some examples of protocols where the use of randomness gives an advantage that cannot be matched by deterministic protocols. Later, we give rigorous definitions for randomized protocols, and prove some basic facts about them.
Some Examples of Randomized Protocols
Equality Suppose Alice and Bob are given access to n bit strings x, y, and want to know if these strings are the same or not (1.1).

There is a simple randomized protocol, where Alice and Bob use a shared random string. Alice and Bob use the shared randomness to sample a random function h : f0, 1gn ! f0, 1gk. Then, Alice sends h(x) to Bob, and Bob responds with a bit indicating whether or not h(y) = h(x). If h(x) = h(y), they conclude that x = y.

If h(x) 6= h(y), they conclude that x 6= y. The number of bits communicated is k + 1. The probability of making an error is at most 2?k—if x = y then h(x) = h(y), and if x 6= y then the
probability that h(x) = h(y) is at most 2?k.

This protocol may seem less than satisfactory, because the number of random bits used if we use an error correcting code. This is a function C : f0, 1gn ! 2km, such that if x 6= y, then C(x) and C(y) differ in all but m/2?W(k) coordinates. It can be shown that if m is set to 10n, then for any k, most functions C will be error
correcting codes.

Given the code, Alice can pick a random coordinate of C(x) and send it to Bob, who can then check whether this coordinate is consistent with C(y). This takes logm + log 2k = O(log n + k) bits of communication, and again the probability of making an error is at most 2?W(k). In this protocol, the players do not require a shared random string, since the choice of C is made once and for all when the protocol is constructed, and before the inputs are seen.

Greater-than Suppose Alice and Bob are given numbers x, y 2 n and want to know which one is greater (1.4). We have seen that any deterministic protocol for this problem requires log n bits of
communication. However, there is a randomized protocol that requires only O(log log n) bits of communication.

Here we describe a protocol that requires only O(log log n _log log log n) communication. The inputs x, y can be encoded by bit binary strings, where ` = O(log n). To determine whether x _ y, it is enough to find the most significant bit where x and y differ. To find this bit, we use the randomized protocol for equality described above, along with binary search. In the first step, Alice and Bob will use the protocol for equality to exchange k bits that determine whether the n2 first (most significant) bits of x and y are the same. If they are the same, the parties continue with the last n/2 bits. If not, the parties discard the second half of their strings. In this way, after O(log n) steps, they find the first bit of difference in their inputs. In order to ensure that the probability of making an error in all of these O(log n) steps is small, we set k = O(log log n). By the union bound, this guarantees that the protocol succeeds with high probability.

Numbers On Foreheads
The number-on-forehead model1 of communication is one way to generalize the case of two party communication to the multiparty setting. In this model, there are k parties communicating, and the i’th party has an input drawn from the set Xi written on their forehead.

So, each party can see all of the inputs except the one written on her forehead. Since each party can see most of the inputs, the parties often do not need to communicate as much. Proving lower bounds for this model is usually more difficult than for the two party case.

Indeed, we do not yet know how of any explicit examples of functions that require the maximum communication complexity in this model, in stark contrast to the models we have discussed before.

Some Examples of Protocols in the Number-On-Forehead Model We start with some examples of clever number-on-forehead protocols.

Equality We have seen that every deterministic protocol for computing equality in the two party setting must have complexity n + 1.

Perhaps surprisingly, the complexity of equality is quite different in the number-on-forehead model. Suppose 3 parties each have an n bit string written on their foreheads. Then there is a very efficient protocol for computing whether all three strings are the same: Alice announces whether or not Bob and Charlie’s strings are the same, and Bob announces whether or not Alice and Charlie’s strings are the same. This computes equality with 2 bits of communication.

Discrepancy
The techniques we have developed for proving lower bounds in prior chapters all rely on the fact that efficient communication protocols lead to small partitions of the space into monochromatic sets.

In this chapter, we shall prove that some functions cannot even be partitioned into large nearly monochromatic rectangles. The ideas we develop will lead to lower bounds for randomized protocols, and the best lower bounds in the number-on-forehead model. To prove these stronger lower bounds, we start by developing techniques to bound the discrepancy of functions.

Discrepancy measures the degree to which a fixed object resembles a truly random object. In our context, a truly random 0/1 valued function is approximately balanced on every specific set—the number of 0 entries in the set is close to the number of 1 entries. Suppose g : D ! f0, 1g is a function and m is a distribution on the inputs to g. Suppose S _ D is a subset of the domain, and let cS be the characteristic function of the set S, so cS(x) = 1 if x 2 S, and cS(x) = 0 if x /2 S. The discrepancy of g with respect to S and m is defined as

For every fixed S and m, a random function will have low discrepancy with high probability. However, every g : D ! f0, 1g is monochromatic on a set of m-measure at least 1/2, so has large discrepancy for some sets. We shall therefore restrict our attention to sets of a specific structure, like rectangles or cylinder intersections. To relate discrepancy to protocols, recall that the bias of a set S is defined to be

A large set with large bias must lead to high discrepancy: Fact 5.1. If S is a set of inputs with Prmx 2 S _ d, and bias(S) _ 1 ? e, then the discrepancy of g with respect to S is at least (1 ? 2e)d. Proof. Only points inside S contribute to its discrepancy. Since a (1 ? e) fraction of these points have the same value under g, the discrepancy is at least d(1 ? e ? e) = d(1 ? 2e).

Functions that have small discrepancy with respect to rectangles must have large randomized communication complexity:
Theorem 5.2. For a fixed distribution m, if the discrepancy of g with respect to every rectangle is at most g, then any protocol computing g with error when the inputs are drawn from m must have communication complexity at least log Proof. Suppose p is a deterministic protocol of length c with error e with respect to m, and let p(x, y) denote its output. By Lemma 1.4,
the leaves of the protocol p correspond to some rectangles R1, . . . , Rt,
with t _ 2c. Then we have

Information
Shannon’s seminal work defining entropy1 has had a big impact on many areas in mathematics. Shannon’s goal was to quantify the amount of information or entropy contained in a random variable X. His definition leads to a theory that is both elegant and useful in many areas, including communication complexity. We begin this chapter with some simple examples that help to demonstrate the usefulness of the concepts of information. Later, we show how these concepts can be used to understand communication complexity.

Entropy
In many cases, the amount of information contained in a message is not the same as the length of the message. Here are some examples:
• Consider a protocol where Alice’s first message to Bob is a c-bit string that is always 0c, no matter what her input is. This message does not convey any information to Bob. Alice and Bob may as well imagine that this first message has already been sent, and so reduce the communication of the first step to 0.

• Consider a protocol where Alice’s first message to Bob is a uniformly random string from a set S _ f0, 1gc, with jSj _ 2c. In this case, the parties could use log jSj bits to index the elements of the set, reducing the communication from c to log jSj.

• Consider a protocol where Alice’s first message to Bob is the string 0n with probability 1 ? e, and is a uniformly random n bit string with the probability e. In this case, one cannot encode every message using fewer than n bits. However, Alice can send the bit 0 to encode the string 0n, and the string 1x to encode the n bit string x. Although the first message is still quite long in the worst case, the expected length of the message is 1?e+e(n+1) = 1+en _ n.

Compressing Communication
Is there a way to define the information of a protocol in analogy with Shannon’s definition of the entropy of a single message? We would like to extend Shannon ideas and measure the information contained in all the messages of the protocol in a way that captures the interactive nature of a communication protocol. Such a definition would shed new light on our understanding of communication complexity.

In this chapter, we explore how to do this for 2-party communication protocols. As we shall see, these definitions lead to several results in communication complexity that do not concern information at all.

Suppose we are working in the distributional setting, where the inputs X,Y to Alice and Bob are sampled from some known distribution m, and in addition the protocol is randomized. We start with several examples to illustrate what the information of protocols ought to be.

• Consider a protocol where all the messages of the protocol are 0, no matter what the inputs are. The messages of the protocol are known ahead of time, and Alice and Bob might as well not send them. This protocol does not convey any information.

• Suppose X,Y are independent uniformly random n bit strings. Alice sends X as her first message and Bob sends Y in response. In this case, the protocol cannot be simulated with less than 2n bits.

• Suppose X,Y are independent uniformly random n bit strings. In the protocol, Alice privately samples k uniformly random bits, independently of X, and sends them to Bob. This protocol can be simulated by a randomized communication protocol with no communication. Alice and Bob can use shared randomness to sample the k bit string, so that they do not have to send it.

Circuits and Proofs
Although communication complexity ostensibly studies the amount of communication needed between parties that are far apart, it is deeply involved in our understanding of many other concrete computational models and discrete systems. In this chapter, we discuss the applications of communication complexity to Boolean circuits and proof systems.

Boolean Circuits
Boolean circuits are the most natural model for computing boolean functions f : f0, 1gn ! f0, 1g. A boolean circuit is a directed acyclic graph whose vertices, often called gates, are associated with boolean operators or input variables. Every gate with in-degree 0 corresponds to an input variable, the negation of an input variable, or a constant bit, and all other gates compute either the logical AND (denoted by the symbol ^) or the OR (denoted by the symbol _) of the inputs that feed into them. Usually, the fan-in of the gates is restricted to being at most 2. We adopt this convention, unless we explicitly state otherwise.

Every gate v in a circuit computes a boolean function fv of the inputs to the circuit. We say that a circuit computes a function f if f = fv for some gate v in it. Every circuit is associated with two standard complexity measures: The size of the circuit is the number of gates, and the depth of the circuit is the length of the longest directed path in the underlying graph. The size corresponds to the number of operations the circuit performs, and the depth to the parallel time it takes the computation to end, using many processors.

Understanding the complexity of computation with boolean circuits is extremely important, because they are a universal model of computation. Any function that can be computed by an algorithm in T(n) steps can also be computed by circuits of size approximately T(n). So, to prove lower bounds on the time complexity of algorithms, it is enough to prove that there are no small circuits that can carry out the computation. Counting arguments imply that almost every function requires circuits of exponential size1. However,
we know of no explicit function for which we can prove a superlinear lower bound, highlighting the difficulty in proving such lower bounds.

Memory Size.

The memory used by an algorithm is an important resource. In this chapter, we explore two related models that measure the amount of memory used, and prove lower bounds on the best possible algorithms optimizing this resource, using communication complexity.

A branching program of length ` and width w is a layered directed graph whose set of vertices is a subset of ` + 1 _ w. Each layer from 1 to ` is associated with a variable in x1, . . . , xn. Every vertex in layers 1 to ` has 2 out-going edges, each labeled by a distinct symbol from d. Edges go from layer u to layer u + 1 for u _ `. The vertices on layer ` + 1 are labelled with an output of the program.

Computing a function f (x) using a branching program is straightforward.

On input x 2 dn, the program is executed by starting at the vertex (1, 1) and reading the variables associated with each layer in turn. These variables define a path through the program. The program outputs the label of the last vertex on this path.

Intuitively, if an algorithm uses only s bits of memory, then it can be modeled as a branching program of width at most 2s. Every function f : dn ! f0, 1g can be computed by a branching program of width dn and length n, and counting arguments show that most such functions require exponential width.1
Another motivation for understanding branching programs comes from a powerful theorem due to Barrington2:
Data Structure
A data structure is a way to efficiently maintain access to data. Many well-known algorithms1 rely on efficient data structures for their efficiency. Lower bounds on the performance of data structures are often proved by appealing to arguments about communication complexity. In this chapter, we explore some methods for proving such lower bounds.

There are several ways to measure the efficiency of data structures, and usually there is a tradeoff between these different measures.

Here we focus on tradeoffs between the total number of memory cells used by the data structure and and the time that it takes to perform operations, like updates or queres to the data. Time is measured by counting the number of memory cells that are accessed in order to perform the needed operation.

We begin this chapter with several examples of useful and clever data structures. Later on, we explain how to prove lower bounds on data structures using ideas from communication complexity.

Dictionaries
A dictionary is a data structure that maintains a subset S _ n, allowing us to add and delete elements from the set. We would also like to ask membership queries of the form—is i 2 S?
The most straightforward implementation is to maintain a string x 2 f0, 1gn that is the indicator vector of S. This allows us to add and delete elements, as well as support membership queries in time 1, but requires n memory cells.
Extension Complexity of Polytopes
Polytope are subsets of Euclidean space that can be defined by a finite number of linear inequalities. They are fundamental geometric constructs that have been studied by mathematicians for centuries.
Any n _ d matrix A and n _ 1 vector b defines a polytope P:
P = fx 2 Rd : Ax _ bg.

In this chapter, we explore some questions about the complexity of representing polytopes—when can a complex polytope be expressed as the shadow of a simple polytope? Loosely speaking, a complex polytope is one that requires many inequalities to describe, and a simple polytope is one that can be described by very few inequalities.

Besides being mathematically interesting, this question is relevant to understanding the complexity of algorithms based on linear programming.

Basic properties and features of polytopes
Polytopes have many nice properties that makes them easy to manipulate and understand. For example, a polytope P is always convex— whenever x and y are in P, then all the points of the line segment between x and y are also in P. Indeed,
if g 2 0, 1, then A(gx + (1 ? g)y) _ gAx + (1 ? g)Ay _ gb + (1 ? g)b = b.

Although the definition of the polytope seems to involve only inequalities, sets defined using equalities are also polytopes. For example, the set given by solutions (x, y, z) 2 R3 with
x = y + z + 1,
z _ 0,
is a polytope, because it can be expressed as:
x ? y ? z _ 1,
?x + y + z _ ?1,
?z _ 0.

A half space H is a particular type of polytope—one that is defined by a single inequality,
H = fx 2 Rd : h _ x _ cg,
where h is a 1 _ d matrix, and c is a real number. A polytope P can therefore be viewed as the intersection of the n half spaces given by the n inequalities that define the polytope. Moreover, every linear inequality that the points of the polytope satisfy can be derived from the inequalities that define the polytope:
Fact 11.1. If a polytope P = fx 2 Rd : Ax _ bg is contained in a halfspaceH = fx 2 Rd : hx _ cg, then there is a 1 _ n row vector u _ 0 such that
uA = h and ub _ c.

Distributed Computing
Distributed computing is concerned with the study of algorithms and protocols for computers operating in a distributed environment. In such an environment, there are n parties that are connected together by a communication network, yet no single party knows what the whole network looks like. Nevertheless, the parties with to solve computational problem together.
More formally, the network is defined by an undirected graph on n vertices, where each of the vertices represents one of the parties. Each protocol begins with each party knowing its own name, and perhaps some part of the input. In each round, each of the parties can send a message to all of their neighbors. In this way, the parties seek to achieve some common goal. The setup is often interesting even when wish to learn something about the structure of the network and have no input besides their own names.

Coloring the Network
Suppose the parties in a distributed environment want to properly color themselves so that no two neighboring parties have the same color. Let us start with the example that n parties are connected together so that every party has at most 2 neighbors.

Here is a simple protocol that finds a coloring with 3 colors and takes O(n) rounds and communication in connected networks. The party whose name is 1 colors itself 1 and send its color to its neighbors.

In the next round, each party that receives a message colors itself with one of the 3 colors, given the color of 1, and then sends its color to its neighbors. In this way, in each round some new vertex colors itself. Thus, every vertex obtains a color after n rounds.

Here is another protocol1 that finds a proper coloring with a constant number of colors in O(log_ n) rounds of communication. Initially, each party colors itself with its name. This is a proper coloring.

The goal now is to iteratively reduce the number of colors.

In each round, the parties send all of their neighbors their current color. If a 2 f0, 1gt denotes the color of one of the parties in a round, and b, c 2 f0, 1gt denote the colors assigned to its neighbors, then the party sets i to be a number such that ai 6= bi, and j to be a number such that aj 6= cj. Its new color is set to be (i, j, ai, aj). The new coloring is proper. Indeed, if the neighbor whose color was b also gets the color (i, j, ai, aj), then we see that bi = ai, contradicting the choice of i.

In this way, the number of colors has been reduced from t to O(dlog te)2. After O(log_ n) rounds, the number of colors is constant. This coloring protocol can be generalized to handle arbitrary graphs of constant degree d. Any graph of degree d can be colored using d + 1 colors. Here we give a protocol2 that uses O(log_ n) rounds to find a coloring using O(d2 log d) colors.