Row vs Column Store
Source material:
Row-oriented | Column-oriented |
---|---|
MySQL, PostgreSQL | BigQuery, RedShift, Clickhouse |
--- | --- |
OLTP (transactional) use cases | OLAP (analytical) use cases |
--- | --- |
Software engineers concerned with throughput, latency, consistency (ACID properties), concurrency, performance of CRUD operations, HA, scalability | Data scientists and analysts who do complex queries on large datasets "offline" are concerned with how efficiently data is stored and how long their queries have to run |
--- | --- |
Optimized for reading and writing entire records/rows | Optimized for aggregating data from a few columns |
--- | --- |
Data is stored contiguously in the same disk block(s) - Fast retrieval of an entire record since all data is physically co-located | Data is stored across a number of disk blocks, each disk location represents values from a single column |
--- | --- |
Inserting a new record is easy (append to a single disk block) | Inserting a new record is costly (have to write to multiple disk locations) |
--- | --- |
If SELECT ing for a single field, all fields (the entire row) must be loaded into memory because rows are stored contiguously on blocks and the entire block needs to be loaded into memory |
Slow to add new data or update existing data because it requires writing to multiple disk locations |
--- | --- |
More difficult to compress a disk block because records have varying data types | Columns can be compressed more efficiently because compression algorithms like data of the same type. |