September 2020 – Journey of an Evangelist

As business use-cases are changing very rapidly there are many data points which can be discussed before finalizing any particular database technologies. With this post I am going to highlight few data point which will may help any technologist in selecting databases specific to there use-cases.

1)COST : Before selecting any technology for your use-cases it is better to consider cost factor upfront. Don’t ignore this thing initially as it will be become of the deciding factor when me move ahead. Below mentioned are some of the data-point which can be used to evaluate cost factor related to any databases:

Open source or licensed database technology is needed?

Maintenance cost of this database technology?

Cost to scale current database server?

Does it provide trial version to do initial POC?

Short-term costs like subscription fees, per-use charges, and “hidden” which are buried in the product offering?

Support fees related to this database technology?

2)SLA: Service level agreement defines the level of service we expect from any database vendor.We have to first decide on Recovery Point Objective(RPO) and Recovery Time Objective(RTO) based on our business requirement.Once RPO and RTO are decided we can use below mentioned queries which help us decide database which are better suited to our requirement:

Availability support should be at-least 99.9%?

Durability support should be at-least 99.99%?

Onsite support business hours and procedures for off-hours support?

Maximum time taken by database technology provider to acknowledge any support ticket raised by team?

Does it provide multi node cluster support?

Does it have any disaster recovery solution?

Time taken to restore any customer data from a backup?

3)Third Party Support:One important aspect before selecting any technology is its compatibility with respect to various third party which comes into picture like Operating Systems, Data Format and Cloud adaptability. Below mentioned data point will highlight some important third party questionnaire’s related to database technology.

Multi Operating Support with respect to Linux,Windows etc is required?

Should be supported by Key Cloud Vendors like AWS ,GOOGLE or Microsoft?

SQL language support is needed?

Supported programming languages will be java,C,C++,Python?

Support of XML,JSON,SPATIAL and unstructured data is needed?

Does it support any third party integration?

4)Data Model:This defines how logical structure of database is modeled. It is the one of the important aspect look by developer/architect before selecting any databases as per business requirement. Below mentioned data point will help in selecting desired database based on its data modeling capabilities:

Schema less design is required?

ACID support is needed?

Limited Query Pattern is needed?

No complex queries, stored procedures, or views are present?

Normalization is required with respect to database?

Every table should include a Timestamp column in order to manage concurrent access by multiple users if optimistic locking is enabled?

Logical data independence support is required or not?

Physical data independence support is required or not?

Easy or hard is it to adapt or extend the underlying data model?

Level of functional dependency between different schema objects?

Single Huge table model is required?

Create read update and Delete operation support is needed?

Degree of the relationships between different entities?

Multi-table relational database is needed?

5)Performance:There may be many specific features present in different database technology which can improve performance of database which in turn convert to better user experience while accessing the application.Below mentioned are some of the parameter which should be taken into consideration while selecting desired databases:

Partitioning of data can be done?

Replication method to be used will be Master-Master or Master-Slave?

In Memory Capabilities is there or not?

OLTP, data warehousing (DW) or mixed (OLTP & DW) database workloads support is needed?

If Analytics, BI or Reporting is required?

Analyzing large quantities of data in read mode only?

Requires distributed computing and storage?

Datatype to be inserted is supported by storing systems?

Complex dynamic queries requiring stored procedures, or view?

6)Security: One of the most important aspect of today’s technology world is security options provided by particular database vendor. Each business use case may require different security techniques to be adapted. Below mentioned data-points will help in selecting desired database technology:

Application requires secures client communication mechanism with respect to database via using SSL or Encryption Protocol?

Auditing Feature with respect to database is required?

Any sensitive data should be kept in encrypted form?

Access rights for users, groups and roles based on security policies?

Technologies that allow you control access and permissions at object/table-level, at feature-level, and at field-level?

Ensure your system will be safe and that only authenticated users will have access by using single-sign-on, two-factor-authentication features?

7)Usability: Even though we went through each key point which was mentioned above one thing which will definitely help use chooses particular technology is it’s usability factor. In simple terms how well we can adapt to this new technology be it through some technical session or practical training. Below mentioned key point will help you narrow down your option:

Implementation process is simple or required some level of expertise?

How strong and connected is the community of users and partners around the database technology?

How widely available is information that will help you troubleshoot or improve your implementation of the database technology?

How frequent is the patch released with respect to any bugs?

Does it have proper Backup and Recovery Solution?

Is there any feasible support site to raise tickets with respect to any bugs?

Overall I tried my best to narrow down points which may help any technologist to select any databases vendors based on there business requirement. Similar concepts can be applied while selecting different technology.

Before reading this post do have a look into NOSQL DATABASE OVERVIEW – Part 1 and NOSQL DATABASE OVERVIEW – Part 2.

This post outlines some fundamental concepts, techniques and patterns that are common among NoSQL datastores and not unique to only one class of non-relational databases or a single NoSQL store.

a) CAP THEOREM: A distributed database has three main desirable properties

Consistency: The data available on all machines should be same in all respects. A distributed system is typically considered to be consistent if after an update operation of some writer all readers see his updates in some shared data source.
Availability: Availability meaning that a system is designed and implemented in a way that allows it to continue operation in case of any failures.
Partition Tolerance: – Unlike the other two requirements, this property can be seen as a statement regarding the underlying system: communication among the servers is not reliable, and the servers may be partitioned into multiple groups that cannot communicate with each other.

The CAP theorem states that: You can have at most two of these properties for any NOSQL systems.

b) BASE: BASE stands for Basically, Available, Soft state, and Eventual consistency. BASE is reverse of ACID properties which is core to RDBMS system. The distributed nature of NoSQL brings possibilities of data being partially available when some parts of the distributed database are not operation or cannot be reached hence, the term Basically Available. Soft State allows data to vary overtime with or without input. Eventually Consistent guarantees that data will become consistent in future and not immediately after an operation

c)Query Model: NOSQL databases offer many querying capabilities based on different NOSQL datastore. Key/Value stores by design only provide a lookup by primary key or some other id field and lack capability to query any further fields, other datastore like document store provide complex query capabilities. Column family stores only provide range queries and some operations like “in”, “and/or” and regular expression, if they are applied on row keys or indexed values. Graph databases can be queried in two different ways. Graph pattern matching strategies try to find parts of the original graph, which match a defined graph pattern.

d)Partitioning: Assuming that data in large scale systems exceeds the capacity of a single server and to ensure better performance of read/write operation data need to be partitioned across multiple clustered machine. NOSQL Database differ in their way to distribute data on multiple machine. Mostly data model of key/value stores, document stores and columnar stores are key oriented data can be partitioned across multiple cluster based on there key value. This key value is generated by a Hash function. Graph information is not gained by simple key lookups but by analyzing relationships between entities. On the one hand, nodes should be distributed on many servers evenly, on the other hand, heavily linked nodes should not be distributed over large distances, since traversals would cause huge performance penalty due to heavy network load. Therefore, one has to trade between these two limitations.

Journey of an Evangelist

Month: September 2020

Database Selection Criteria

NOSQL DATABASE OVERVIEW – Part 3