Typically, I am facing with a lot of questions from many developers about how internally Apache Cassandra storage works. So, I think it is good idea to tell something about how Cassandra internally deals with data. The article describes what data structures Cassandra uses to provide such fast access(especially write-access) to Data.
NoSql is not equal Schema-less
NoSql equals schema-less: this sentence isn`t valid, especially for Cassandra. Cassandra(starting from version 0.7) encourages developer to share schema information to achieve more transparency.
For example, creating ColumnFamily(Table) with CQL:
CREATE TABLE timeline (
PRIMARY KEY (user_id));
Looks pretty similar to SQL? But it does not work in a similar way.
In RDBMS storage engine is based on b-trees, while Apache Cassandra implements log-structured merge-tree.
The rough difference between RDBMS and Cassandra – if you will insert something with primary key(not full row) into RDBMS, resources is allocated to complete row. Unlike it, in Cassandra each row is sparse: it is stores just columns present in inserted data. Thus is possible, also according use of log-structured merge-tree.