Heap File

What
A heap file is one that is not organized in any particular order

Record Level Operations
The heap file must keep track of a few things to handle record level operations
 * Pages in the file
 * Free space on the pages
 * Records on the pages

Implementations

 * List
 * Directory of Pages

List Implementation
The list implementation keeps track of free and used pages by maintaining 2 doubly linked lists. One for the full pages and another for the pages with some free space.

The DBMS can remember where the first page is located by maintaining the pair (heap_file_name, page_1_addr). The first page is called the header page.

Disadvantages

 * A disadvantage is that virtually every page will be on the free list if records have variable length because each page will have at least a little free space

Directory of Pages
Using the directory of pages approach solves the earlier problem of the list implementation where all of the pages were likely to end up on the free list in the event of variable length records

We can maintain a bit or a count per entry in the directory that denotes whether the page it is pointing to has free space or in the latter case, how much free space it has exactly.

Cost of Operations
For the calculations, we will use the following values:
 * B: The number of data pages
 * R: The number of records per page
 * D: (Average) time to read or write disk page

Scan All Records
Scanning all records takes BRD because we literally must read every record on every page in the file.

Equality Search
On average, the record we are searching for will end up being in the middle. This is because, sometimes we search for the record at the beginning and sometimes at the end. Therefore, the cost of equality search will be $$0.5BD$$

Range Search
For the range search on a heap file, since the data is not ordered, we never know if a record will satisfy the range search condition. Therefore, we must search the entire file and end up with a cost of $$BD$$ for our troubles

Insert
Insertion into a heap file is pretty painless because we are not enforcing any order. Therefore, we only need to read the page and write the page back, netting us a cost of $$2D$$

Delete
Deletion from a heap file is slightly more complicated because we have to find the record before we can delete it (duh!). Therefore, we must tack on an equality search, giving us the total cost of $$(0.5B + 1)D$$. The extra 1 I/O represents the cost of writing the page back out.