onprema

Row vs Column Store

Source material:

Row-oriented Column-oriented
MySQL, PostgreSQL BigQuery, RedShift, Clickhouse
--- ---
OLTP (transactional) use cases OLAP (analytical) use cases
--- ---
Software engineers concerned with throughput, latency, consistency (ACID properties), concurrency, performance of CRUD operations, HA, scalability Data scientists and analysts who do complex queries on large datasets "offline" are concerned with how efficiently data is stored and how long their queries have to run
--- ---
Optimized for reading and writing entire records/rows Optimized for aggregating data from a few columns
--- ---
Data is stored contiguously in the same disk block(s) - Fast retrieval of an entire record since all data is physically co-located Data is stored across a number of disk blocks, each disk location represents values from a single column
--- ---
Inserting a new record is easy (append to a single disk block) Inserting a new record is costly (have to write to multiple disk locations)
--- ---
If SELECTing for a single field, all fields (the entire row) must be loaded into memory because rows are stored contiguously on blocks and the entire block needs to be loaded into memory Slow to add new data or update existing data because it requires writing to multiple disk locations
--- ---
More difficult to compress a disk block because records have varying data types Columns can be compressed more efficiently because compression algorithms like data of the same type.

#databases