Vertica Database


Vertica is design to manage terabytes of data faster and more reliably than any conventional row oriented RDBMS databases (e.g. Oracle, DB2, and Microsoft SQL Server). The more salient features of Vertica include:

Column orientation – Queries 50x-200x faster by eliminating costly disk IO

“Scale-out” MPP architecture – Scale limitlessly just by adding new servers to the grid

Aggressive data compression – Reduces storage costs by up to 90%

Automatic high availability – Runs non-stop with automatic replication, failover and recovery

Deployment Flexibility – Deploy on Linux, industry standard hardware appliance, VMware or in the Amazon Cloud to handle wide variety of projects.

 

The key to Vertica’s performance is three fold:

1. Vertica organizes data on disk as columns of values from the same attribute, as opposed to storing it as rows of tabular records. This organization means that when a query needs to access only a few columns of a particular table, only those columns need to be read from disk. Conversely, in a row-oriented database, all values in a table are typically read from disk, which wastes I/O bandwidth.

2. Vertica employs aggressive compression of data on disk, as well as a query execution engine that is able to keep data compressed while it is operated on. Compression in Vertica is particularly effective, as values within a column tend to be quite similar to each other and compress very well—often by up to 90%.

3. Because data is compressed so aggressively, Vertica has sufficient space to store multiple copies of the data to ensure fault tolerance and to improve concurrent and ad hoc query performance. Logical tables are decomposed and physically stored as overlapping groups of columns, called “projections,” and each projection is sorted on a different attribute (or set of attributes), which optimizes them for answering queries with predicates on its sort attributes A Vertica database is composed exclusively out of these query-optimized structures on disk, without the overhead of base tables.

Vertica provides a standard SQL interface to users, as well as compatibility with existing ETL, reporting, and business intelligence (BI) tools. Vertica is designed to run on inexpensive clusters or “grids” of off-the-shelf Linux servers that use local disk for storage—no expensive SANs or high-end servers are required to run large data warehouses on Vertica (although it performs well using shared SAN storage if that’s a preferred deployment route). Vertica both reduces hardware costs (often by up to 90% relative to other data warehouse databases) and improves the ability to answer more queries for more people against more data.