Symbol Tables In Compiler Design

A compiler from the user’s perspective is a software that reads input source files and compiles them. The output of the compiler is usually one main executable file and some auxiliary files. The compiler should be fast and should generate optimized code.

But for the compiler designer a compiler is a beautiful balance between data structures and algorithms. Both are needed to quickly scan source files, to parse the tokens, to generate intermediate code, to optimize it and to link modules. Each compiler stage needs the data in some format. Even highly optimized algorithm would be inefficient if the data would not be stored efficiently. One of the most important data structures in every compiler is symbol table.

Symbol table is a special data structure that holds all symbols, from identifiers to internally generated nodes. Compiler symbol table must contain data structures that will hold string values for symbol names, integer values for data pointers, bit values for boolean flags and fields for special purposes. The organization of the symbol table must be such that it is possible to quickly search for a symbol, to quickly move to the next one. to easily add a new symbol at any position, to easily move data from one place to another and not to use much memory. When you try to combine all the requirements you will find that it is not so easy to decide in which form the data should be stored. One of the compromises is to use different symbol tables for different kind of data.

For example, the symbol table that stores the identifiers needs to efficiently store strings of variable length with associated attributes. One of the functions most frequently called during the scanning of the source file is to check whether the identifier is already in the symbol table. The brute force method to check all identifiers would be very inefficient. Therefore, a better method has to be found. A common approach is to use hash tables. There is a hash function that for each identifier calculates some integer value. This value should only depend on the identifier name. This value needs to be power of 2 and few bits is enough. For each hash value there is a separate linked list of identifiers. So the hash function determines in which list the identifier will be stored. This way we can minimize the search count.

Another example is symbol table that, for example, holds the nodes of program control flow. You need to be able to quickly move in both directions starting from any node. This requirement implies use of two-way linked lists.

The best way to learn about symbol tables, hash functions, linked lists and algorithms is to examine the code of some compiler. You will need some time to become familiar with the functions and the data used but then you will have an overview of the whole picture. Every compiler is a symphony of data structures and algorithms.



Source by I. Funa

Prince

Recent Posts

Putting Together Your Cleaning Business Portfolio

If you are already trying to put together your cleaning business portfolio, then this already…

2 years ago

Ideas for Financing a New Embroidery Company

Because the area of financing can be confusing, yet crucial to the success of any…

2 years ago

6 Tips To Becoming A Successful Entrepreneur

Its best for an average person to have an entrepreneurship mindest and build on their…

2 years ago

Entrepreneurship and Innovation: The Inconvenient Truth

Entrepreneurship and innovation: The popular beliefEntrepreneurs are widely believed to be the agents behind economic…

2 years ago

Work at Home Business Ideas

Here are some excellent businesses that you can start, operate and grow from your home.…

2 years ago

Three Reasons For Starting a Family Business

You want to start a business but also know you do not want to do…

2 years ago