Learn why the shift from mutability to immutability in data analytics has occurred and the benefits.
Data analytics is a field that’s been continually growing for the better part of thirty years. Back in 2003, five exabytes of data was the total amount that Google had created - documenting everything on the internet at that point. By 2021, this amount was created every 30 minutes or so, demonstrating the huge growth in data production.
As data production has grown, our efforts to manage and tame that data have equally increased - hence the growth in the field of data analysis. Over the years, this field has gone through many changes, introduced new tools, and revolutionized its common practices.
The landscape of data analytics looks incredibly distinct in 2022 than it did back in 2003. But, looking at even more recent history, the past few years have seen a major shift in how analytics use data itself. In this article, we’ll be discussing mutability and immutability in data analytics, pointing toward how and why the shift from one to the other has occurred.
Let’s get right into it.
Data from smartwatches and fitness trackers is mutable data
Mutable data is any information that you can change. Think of a Word document - if you save it after writing something new, those changes now become the new normal. Instead of creating a new document, the document simply adapts to show new information - this is mutable data.
Anything that can change is mutable, with this definition also extending to data itself. With the flexibility that mutable data offers, it is incredibly handy for business analysis. Considering the sheer quantity of data produced those businesses need to manage, if they couldn’t change and adapt data to reflect recent updates, the sheer scale of data would be unmanageable.
Most of the current databases deal in mutable data, lending themselves to live and changeable data analytics.
Healthcare industry thrives when using immutable data.
Unsurprisingly, immutable data is the direct opposite of mutable data. Immutable data cannot be changed or deleted. Instead of selecting a piece of data and saving over it with new information, you can only add appendices to immutable data. In short, this means that your updates are either put into a new location, or a completely new data form is created. After all, you cannot just replace immutable data.
Immutable data was once actually thought of as extremely useful. When working inside of a distributed system, immutable data makes scaling a system incredibly easy. If data is always changing, as it is with mutable data, then data engineers would have to waste resources always checking it against the original source. If there was a change during the scaling process, the data would be incorrect, creating a much larger problem.
Beyond this, specific industries still thrive when using immutable data. Take the healthcare industry as an example. If users were able to just delete prescription records or medicines administered, then you would never be able to truly trace a medical history. This would make practicing health care much more difficult, as you couldn’t verify the truth of the history you are working with.
Yet, as useful as immutable data is, there are an equal number of pitfalls.
Immutable data certainly has its place. Yet, as time goes on, people have realized that the ability to change and adapt data is vital. Beyond just business, if we had to create a new file every time we wanted to update something, we would be producing data at a rate that looks ridiculous - even for a society that’s already producing 2,5 quintillion bytes per day.
Equally, the notion that we cannot destroy some data poses some problems. With the range of privacy laws, we live under, GDPR regulations require some data to have the ability to disappear after a certain amount of time. Wiping immutable data clean is nearly impossible, which creates issues.
While modern cloud data warehouses do have methods of getting around immutable databases, they are still difficult to employ. Take two leading cloud data warehouses, Apache Pinot vs Druid. While trying to beat the other out in terms of efficiency or speed, Apache devotes a whole section of their site to mentioning copy-on-write. This is a strategy they use to index data that is seen as immutable, with the ability to do this being a big selling point for the business.
Although there are ways around how stagnant immutable data can be, more and more businesses are now doing a 180 and only focusing on mutable data.
This is mainly because things naturally must change. If we want live weather updates, we know that it’s not going to be the same all day. Equally, real-time analytics rely on things shifting and changing. We don’t want to run the calculations from zero every time. We’d rather just change the final output based on the current shift in source data.
One of the biggest changes of the past few years has been a complete focus on AI technologies and machine learning. As exciting new fields, which rely on being able to adapt and change at will, these industries rely directly on mutable data.
Being able to read, understand, and adapt data in front of an AI tool relies on being able to enact change. If we think of Facebook flagging some posts as potentially spam or false advertising, this relies on being able to mark that data. If data wasn’t mutable, Facebook would have to create the data again with this new change, effectively doubling the amount of data it was providing.
AI tools that simply add a flag or spam tag to the data provide an accessible and quick way of adapting to circumstances. From changing data to allowing AI tools to learn, this is essential in our modern world.
While both mutable and immutable data do have their place in data analytics, the former is now the preferred source for data engineers. As an inherently more flexible format, mutable data lends itself to modern business practices. Things are rarely black and white, with mutable data living in this gray area.
At least for the next few years, as AI and big data continue to gain popularity, we’re likely to see a lot more of a focus on the importance of mutable data. As a source that can shift, change, and adapt as we like it, mutable data is vital for business analytics. Especially when seeking real-time updates, mutable data provides an easy and accessible method of providing this service.