NoSQL Databases Tips

Introduction

We explore the basics of NoSQL databases and delve into specific examples like MongoDB, Cassandra, and Cloudant. NoSQL databases offer a non-relational approach and are known for their scalability. We discuss the benefits of using Key-Value stores for basic operations and how Column-based NoSQL databases allow flexibility in managing columns and rows. Additionally, we touch upon working with distributed data, the importance of ACID transactions in financial institutions, and data sharding techniques.

Within the context of MongoDB, we highlight its scalability options, schema-free nature, and code-first approach. We also introduce the concept of replica sets and the ability to consolidate different types of data into a single view. The MongoDB shell is mentioned as a useful tool for database interaction.

Moving on to Cassandra, we emphasize its availability even in the event of cluster loss and the use of tables and keyspaces for data organization. We provide insights into modeling data with partition keys, clustering keys, and primary keys. Blobs are mentioned as suitable for storing multimedia objects, and we touch upon some basic commands and operations.

Lastly, we discuss Cloudant, focusing on its language-specific libraries for application development and the use of a document database for security and querying purposes. Geospatial technology is highlighted as ideal for specific industries, and we mention considerations regarding continuous replication and curl commands for interacting with Cloudant.

Throughout the content, we provide valuable insights into the features, capabilities, and best practices associated with each NoSQL database, offering a comprehensive overview of their functionalities and potential use cases.

Basics

The NoSQL family of databases vary widely in style and technology, but which all share a common trait in that they are non-relational in nature
Scalability may be the most common reason to use NoSQL databases
Because Key-Value stores are represented as a hashmap, they are powerful for basic Create-Read-Update-Delete operations.
Column-based NoSQL can share all, a subset, or none of the columns, and columns can be added to any number of rows.

Working with Distributed Data

Financial institutions will almost exclusively use ACID databases for money transfers because these operations depend on the atomic nature of ACID transactions.
Data sharding is a process, which breaks data into smaller pieces for storage in a distributed system, is also called partitioning of data by some NoSQL databases.
A distributed system can guarantee delivery of only two of the three desired characteristics necessary for the successful design, implementation, and deployment of applications.
In NoSQL, the way your application accesses the data for the queries you are going to make is the driver. Models should be based on how the app interacts with the data rather than how the model can be stored as rows in one or more tables.

Basics of MongoDB

Scalability provided by MongoDB means that as your data needs grow, you can scale vertically by introducing bigger, faster, better hardware. You can also scale horizontally by partitioning your data.
MongoDB does not need table structures to hold data, allowing you to focus on the data you are writing and how you’re going to read it.
MongoDB follows a Code-first approach, implying that you don’t have to start with defining the schema and table structures before you can write data into the database.
Typical MongoDB are three-node replica set. Through the replication process, a copy of your data on the primary node is copied to other data bearing nodes in the cluster.
MongoDB allows you to bring different types of data from different sources consolidating it into a single view for all data.
Mongo shell is a command line tool provided by MongoDB to connect with your databases.
A typical MongoDB cluster is made of three data bearing nodes. All three nodes have the same data, hence the name ‘replica set’. Data is written to the primary node, which then gets replicated to the secondary nodes.

Cassandra Basics

Cassandra’s availibility: Even if you lose a part of your cluster there will still be nodes available to answer the service request, but the returned data might be inconsistent.
The two logical entities in the cassandra data model: Tables are logical entities that organize data storage at cluster and node level (according to a declared schema), and keyspaces are logical entities that each contain one or more tables.
For modeling data, choose a partition key that starts answering your query but that also spreads the data uniformly around the cluster
For modeling data, build a clustering key that helps you reduce the amount of data that needs to be read by ordering your clustering key columns according to your query
For modeling data, build a primary key that allows you to minimize the number of partitions read in order to answer a certain query
Blobs are typically used to store images, audio, or other multimedia objects.
ALTER TABLE can add new columns to a table schema.
Introducing IF in INSERT and UPDATE instructs Cassandra to look for the data, read it, and only then perform a given operation
CQL does not support JOIN statements. You can store the data already joined.
Using DROP KEYSPACE will lead to the removal of all the keyspace tables and the data those contain.

Cloudant Basics

You can develop your own applications using language-specific libraries (wrappers that help you work with an API).
Cloudant uses a document database for security access reasons, as you can apply access roles at the database level, and for querying, as you can’t index or query within a single API call across databases.
Geospatial is the IBM Cloudant technology ideal for applications involving oil, gas, or transportation that use tracking analytics. Examples include satellites and transportation vehicles.
Continuous replication requires extra system calls, which will likely increase the cost to an organization running multi-tenant instances. This is why continuous replication is not enabled by default.
A variable must be preceded by a dollar sign in a curl command
When you run a command in curl without a specific method, it defaults to using the GET HTTP method.
Each JSON object must be less than 1 megabyte and can contain any number of strings, numbers, Booleans, arrays, and nested objects.

Dany Djeudeu, PhD

Dany Djeudeu is a versatile Freelance Data Scientist, Statistician, and AI Engineer with extensive industry experience. With a passion for solving complex data challenges, Dany is committed to helping organizations unlock the full potential of their data. His strong track record of delivering high-quality solutions and exceeding client expectations has earned him a reputation as a trusted partner. Contact Dany today for a free initial consultation: Please click on the “+” sign below.