A lot has been written about transactions, a lot I read. Yet here is some more that I am planning to write in context to what we face in real life situations while developing enterprise applications. A few of the forthcoming posts will be dedicated to transactions before I start moving into other pain areas that we developers face in actual development situations.
To start with, let's consider what Transaction is. Out of the book, it is defined as a unit of work that either completes or fails. Try digging deeper and it starts getting all confusing with isolation levels, ACID concepts and lot more. But here we shall not rewrite what is already present in a lot of documentation.
What we shall deal with today, is the intent of transactions. Why do we have to consider transactions in the first place? Understanding that is imperative for developers to correctly identify "transaction boundaries" when it comes to real implementation.
Any piece of code executes as an interaction between modules, systems or methods.
While talking transactions here are the questions that a developer shall ask oneself to correctly understand whether he needs one or not, while writing code and where to start and end it.
a. Is the interaction going to persist some information into a persistent storage? Storage can be but is not limited to databases. We can define transaction boundary during reads too, but developers should use discretion while doing so. If parts of application just read data but does not decide to update or insert new data based on what it reads, it can live without transactions for reads. Believe me, there are lot of such scenarios in real life, no matter what gurus may say.
b. What happens if the storage of that data fails? If it does not matter, such as in the case of a user submitting a tweet, the developer can just save precious processing time by staying away from transactions. This is a major reason for Twitter should store tweets in Cassandra, but that shall be a different topic for another day. (Imp: Twitter is not using C* for storing tweets, but for geolocation and analytics but that's something only twitter knows. C* is a good fit for tweets, if it fails the user may just click again. He won't even know)
c. Where does the interaction starts storing data and where exactly does it end? If there are multiple pieces of data being stored, what all data is related that needs to be stored "together". That's the unit of work and right where we start storing it and where we end storing all pieces, defines our transaction boundary.
It is extremely useful to understand transaction boundaries of an application to efficiently implement and use transactional capabilities in enterprise applications. A lot of enterprise applications end up running entire processing logic within container transactions while a very small part of that code would actually be performing any real "transactions". That leads to unnecessary waste of precious system resources and processing time.
To start with, let's consider what Transaction is. Out of the book, it is defined as a unit of work that either completes or fails. Try digging deeper and it starts getting all confusing with isolation levels, ACID concepts and lot more. But here we shall not rewrite what is already present in a lot of documentation.
What we shall deal with today, is the intent of transactions. Why do we have to consider transactions in the first place? Understanding that is imperative for developers to correctly identify "transaction boundaries" when it comes to real implementation.
Any piece of code executes as an interaction between modules, systems or methods.
While talking transactions here are the questions that a developer shall ask oneself to correctly understand whether he needs one or not, while writing code and where to start and end it.
a. Is the interaction going to persist some information into a persistent storage? Storage can be but is not limited to databases. We can define transaction boundary during reads too, but developers should use discretion while doing so. If parts of application just read data but does not decide to update or insert new data based on what it reads, it can live without transactions for reads. Believe me, there are lot of such scenarios in real life, no matter what gurus may say.
b. What happens if the storage of that data fails? If it does not matter, such as in the case of a user submitting a tweet, the developer can just save precious processing time by staying away from transactions. This is a major reason for Twitter should store tweets in Cassandra, but that shall be a different topic for another day. (Imp: Twitter is not using C* for storing tweets, but for geolocation and analytics but that's something only twitter knows. C* is a good fit for tweets, if it fails the user may just click again. He won't even know)
c. Where does the interaction starts storing data and where exactly does it end? If there are multiple pieces of data being stored, what all data is related that needs to be stored "together". That's the unit of work and right where we start storing it and where we end storing all pieces, defines our transaction boundary.
It is extremely useful to understand transaction boundaries of an application to efficiently implement and use transactional capabilities in enterprise applications. A lot of enterprise applications end up running entire processing logic within container transactions while a very small part of that code would actually be performing any real "transactions". That leads to unnecessary waste of precious system resources and processing time.
Comments
Post a Comment