Michael Data

For a data structure that mixes updates and queries, [http://en.wikipedia.org/wiki/Persistent_data_structures Persistence] refers to the ability to query and possibly update old versions of data.

Partial persistence requires each update to apply to the latest data version and can return a version id which can be used for queries. Full persistence allows updates to old data versions.

Operations

Updates: push, pop

query: top, length

construction from an array of objects

Techniques for Persistence

Fat Nodes

Make each object (or each cell of memory) into a data structure that remembers its version history. A persistent array might track version history for each array cell individually.
This approach applies to almost everything.

Path Copying

Used in tree structures represented only by child pointers (no parent or sibling links).
A version is prepresented as a pointer to the root of the tree. Queries are the same as for non-persistent trees. Updates create an entirely new path from the root to the updated node. The new version refers to the root of the copied path tree. This is basically how Persistent Stacks work.

Queries are fast here, but it requires more space for all the duplicated nodes.

Implementations

Persistent Stacks

Stack node object. Contains data pointer, previous pointer, and a link number.

  • construction():
  • return new node with length 0
  • top(stack obj S):
  • if s.length == 0:
  • error
  • return s.data
  • length(s):
  • return s.length
  • push(s,x):
  • make new node n
  • n.previous = s
  • n.length = s.length + 1
  • n.data = x
  • return n
  • pop(s):
  • if s.length == 0:
  • error
  • return s.previous

This structure is typically non-persistent. A persistent one can be created by storing the pointer results from each update operation. This results in a fully-persistent data structure.

Applications

Closures

Software “closures” invented for Scheme. Python version:

  • def function1(x,y,z):
  • do something…
  • def function2(p,q):
  • function3(x,y,z,p,q)
  • return function2
  • F = function1(x,y,z)
  • x = F(p,q)

The function2 returned by function1 must persistently store variables x,y,z. A persistent stack is a way to preserve them, as opposed to the non-persistent stack typically used for function calls.

Geometry

Given a set of disjoint polygons (with no vertical sides) and a set of (x,y)-coordinate pair points, return a reference for each point to the polygon that contains it.

A non-persistent solution uses binary search trees: sweep a vertical line from left to right across the polygons. At each time step, the intersection of the line with the set of polygons forms a set of intervals on the line. Each interval is associated with a polygon; multiple intervals can occur for a single polygon. Each point or polygon vertex in order determined by x-coordinate requires updating the interval data structure or checking for point membership. It will maintain a binary search tree of interval endpoints in top-to-bottom order This works in O((m+n) log n) time for n points and m polygon vertices.

A persistent solution pre-processes the polygon input into a data structure which will handle future queries about a point at a time in arbitrary order. The persistent search tree improves query time.

There is a hybrid of fat-node and path-copying which stores O(n) space and O(log n) query time.