Tree (data structure)

Definition

A tree is a widely used abstract data type in computer science, representing hierarchical relationships between objects or sets of objects. It is a non-linear data structure compared to arrays, linked lists, stacks and queues which are linear data structures. A tree can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.

Terminology

In a tree data structure, the following terms are commonly used:

- Node: Each element in the tree is called a node. A node is a part of the tree that can have a name, which we call the key. A node may also have additional information, which we call the payload.

- Edge: An edge connects two nodes to show that there is a relationship between them. Every node (except the root) is connected by exactly one incoming edge from another node. Each node may have several outgoing edges.

- Root: The root of the tree is the only node in the tree that has no incoming edges. In a tree, there is exactly one node with no incoming edges.

- Path: A path is an ordered list of nodes which are connected by edges. For example, Mammal → Carnivora → Felidae → Felis → Cat is a path.

- Children: The set of nodes c that have incoming edges from the same node to are said to be the children of that node.

- Parent: A node is the parent of all the nodes it connects to with outgoing edges.

- Sibling: Nodes in the tree that are children of the same parent are said to be siblings.

- Leaf: A leaf is a node with no children.

- Subtree: A subtree is a set of nodes and edges comprised of a parent and all the descendants of that parent.

- Level: The level of a node n is the number of edges on the path from the root node to n. By definition, the level of the root node is zero.

- Height: The height of a node is the number of edges on the longest path from the node to a leaf. The height of a tree is the height of the root.

Types of Trees

There are several types of trees in computer science, each with its own unique characteristics and use cases.

- Binary Tree: A binary tree is a tree-type non-linear data structure with a maximum of two children for each parent.

- Binary Search Tree (BST): A binary search tree is a binary tree with the property that the key in each node must be greater than all keys stored in the left sub-tree, and not greater than the keys in the right sub-tree.

- AVL Tree: An AVL tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one.

- Red-Black Tree: A red-black tree is a type of self-balancing binary search tree where each node has an extra bit for denoting the color of the node, either red or black.

- Heap: A heap is a special tree-based data structure in which the tree is a complete binary tree. Generally, Heaps can be of two types: Max-Heap and Min-Heap.

- B-Tree and B+ Tree: They are used in databases and file systems. In B tree, a node can have more than two children. B+ tree is an extension of B tree which allows efficient traversal of records.

- Trie (Prefix Tree): A trie, also called digital tree and sometimes radix tree or prefix tree, is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

- Suffix Tree: A suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values.

Tree Traversals

Tree traversal is a process of visiting (checking and/or updating) each node in a tree data structure, exactly once. Unlike linked lists, one-dimensional arrays and other linear data structures, which are traversed in linear order, trees may be traversed in multiple ways. They may be traversed in depth-first or breadth-first order.

There are three common ways to traverse a tree in depth-first order:

- In-order (LNR): The left subtree is visited first, then the root and later the right sub-tree. We should always check if the node is a leaf node.

- Pre-order (NLR): The root node is visited first, then the left subtree and finally the right subtree.

- Post-order (LRN): First we traverse the left subtree, then the right subtree and finally the root node.

In a breadth-first traversal, level order traversal is used where we visit every node on a level before going to a lower level.

Applications of Trees

Trees are one of the most useful data structures in computing, and they are used in many areas of computer science, including:

- Operating Systems: Trees are used in file systems of the Operating System. Directories are structured as a tree.

- Databases: Databases use B-tree data structures to store information.

- Router Algorithms: Trees are used in routers algorithms in the form of a routing table.

- Artificial Intelligence: Trees are used in AI for the construction of decision trees, game trees etc.

- Networks: Trees are used in network or graph algorithms like the minimum spanning tree, max flow min cut theorem etc.

- Compiler Design: Trees are used in syntax analysis.

Definition

Terminology

Types of Trees

Tree Traversals

Applications of Trees

See Also