Tech

In-Memory vs. On-Disk Database Systems

CIOL Bureau

12 Feb 2009 00:00 IST

Updated On 12 Feb 2009 19:39 IST

New Update

BANGALORE, INDIA: In-memory database systems (IMDSs) store records in main memory, they never go to disk. Through this elimination of disk access, IMDSs claim significant performance gains over more familiar database management systems (DBMSs).

Advertisment

But, the fact that retrieval from memory trumps disk access for performance adds nothing new to database knowledge. Traditional databases have long offered caching as a way to hold frequently-used records in memory, in order to increase responsiveness. Do in-memory databases really offer anything new?

As it turns out, physical disk I/O is just the most final, and the most visible, link in a chain of processing inherent to traditional databases (let’s call them on-disk databases) and premised on the idea that data must ultimately reside in permanent storage. Caching boosts DBMSs’ performance, especially when an application is reading records; however, all DBMS updates are ultimately written through the cache, to disk.

IMDSs gain their advantage by eliminating or greatly mitigating this bundle of processing “costs,” including the following:

Advertisment

Caching

Ironically, while on-disk DBMSs use caching to boost performance, IMDSs gain considerable speed by dispensing with it. Processes that make up caching include cache coherence, which makes sure that an image of a database page in cache is consistent with the physical database page on disk and (when applicable) other caches in a distributed cache setting; cache lookup, which determines if data requested by the application is in cache and, if not, retrieves the page; and least-recently used (LRU) algorithms within general cache management logic to keep frequently accessed data in cache and flush out less frequently accessed data.

Caching functions play out every time the application makes a function call to read a record from disk, draining CPU cycles and consuming memory. IMDSs impose no such overhead. Eliminating these processes (and others) also simplifies the overall design of IMDSs, resulting in a smaller code size and lower demands for memory and CPU cycles.

Advertisment

Data Transfer Overhead

Data transfer also causes on-disk databases to lag. With such DBMSs, the application works with a copy of the data contained in a program variable that is several times removed from the database. Consider the "handoffs" shown in below figure for an application to read a piece of data from an on-disk DBMS, modify it, and write that data back to the database.

Advertisment

In contrast, with an IMDS, there are at most two copies of the data: The copy within the database and possibly a working copy in local storage during the scope of a database transaction.

Operating System Dependency

Operating system dependency presents another significant performance variable. On-disk databases use the underlying file system to access data within the database. The quality of data-seeking functions provided by a particular OS (such as lseek() under Linux) will affect performance, for better or worse. In contrast, the in-memory database operates independent of the OS file system and is highly optimized for data access.

Advertisment

Transaction Processing Overhead

In transaction logging, every change to the database is recorded in a journal (the transaction log). In the event of a catastrophic failure, the database can recover, upon restart, by committing or rolling back transactions from log files. Disk-based databases are hard-wired to keep these logs, and to flush log files and cache to disk after transactions are committed.

With IMDSs, this journaling approach is typically optional. For example, in addition to providing transaction logging, McObject’s eXtremeDB IMDS offers an in-memory alternative: the database maintains a before-image of the objects that are updated or deleted, and a list of database pages added during a transaction. Upon transaction commit, the memory that held before-images and page references returns to the database memory pool (a very fast and efficient process). If the database must abort a transaction -- for example, if the in-bound data stream is interrupted -- the before-images are returned to the database and the newly inserted pages are returned to the database memory pool.

Advertisment

In the event of catastrophic failure, this database image is lost. In many instances, this is acceptable, which points to a difference between the typical uses of IMDSs, and those of the more business-oriented DBMSs. In embedded applications, an IMDS can often simply be re-provisioned upon restart, as with a program guide application in a set-top box that receives continual updates from content providers, a network switch that discovers network topology on startup, or a wireless access point that is provisioned by a server upstream.

This does not preclude data persistence. At startup or any other point, the application can open a stream (a socket, pipe, or a file pointer) and instruct the IMDS to read or write a database image from, or to, the stream, for example, to create and maintain boot-stage data. Persistence can also be obtained by pairing an IMDS with non-volatile memory such as battery-backed RAM. Other applications guarantee in-memory database survival by maintaining one or more synchronized database copies (replication).

More recently, vendors have introduced hybrid database technology that combines in-memory and on-disk data storage in the same database instance. With McObject’s eXtremeDB Fusion, a notation in the database design or "schema" causes certain record types to be written to disk, while others are managed entirely in memory, enabling the developer to make fine-grained tradeoffs between persistence and performance.

Many application types will continue to use DBMSs that inherently store records to disk. But for software requiring a performance edge, or an exceptionally small footprint, in-memory database systems can provide both.

tech-news