Data is expensive
Or rather: data is actually very cheap, which is the only reason that crypto has been allowed to proliferate in the first place. Only through the abundance of data and the ability to transfer it quickly was something so inefficient as a decentralized redundant database allowed to exist.
But still, data that's been backed up across hundreds or thousands of nodes is pretty damn expensive as a whole. This is why increasing the blocksize can escalate into such a heated debate (that and the lack of financial incentives to run a node).
Clearly the answer is rooted within incentives and sharding.
Incentives are pretty obvious. If we want users to provide infrastructure to the network they should be paid for providing infrastructure. Not only does Bitcoin refuse to incentivize their infrastructure, they basically can't do it on a technical level at this point. They are locked in; let the chips fall where they may.
The process of sharding is also very important. The main concept behind sharding is pretty simple. Every node in the network doesn't have to store the exact same copy of all the data. It can be split up into chunks and stored across enough nodes so that it will still be safe and secure without being a burden to everyone. The result of this is that consensus can also be reached without permission from every single node in the network.
It would be a bit of an oversight if I failed to mention that the Speak network is basically trying to accomplish this exact type of thing. Get those big chunks of data off-chain and have them stored on hard-drives across the world within a robust and incentivized ecosystem. Kudos to them!
However, I'm coming at this at a slightly different perspective. That perspective is usually gaming because I'm convinced that once WEB3 gaming gets any kind of momentum and injects itself into the mainstream it will go immediately viral. I'm 100% certain that at least one of the insane bull markets that crypto tends to get will be due to gaming alone. That narrative will get its day eventually, although it does seem to be quite far away from what I can tell. Suddenly, and then all at once, as they say.
And gaming is completely different than something like a YouTube video.
A video either exists and is accessible... or it isn't. Pretty black and white, that. Gaming, on the other hand, has a lot more ones and zeros involved. Like if you play a game does a recording of that game need to be saved? Not really. Definitely not a video recording, there'd be way too many floating around to save them all anyway.
These technicalities lead me to ask what actually needs to be stored on-chain. The number one answer is clearly who owns what. If there's going to be one thing stored on chain it damn well better be the ledger of which players own which assets.
Unfortunately even this is reductive.
Because there are so many different types of games that employ different types of ownership. A free-to-play MOBA or a first-person-shooter only needs to track skin ownership and ladder rank. But something like an MMO might be tracking multiple types of currencies, thousands of characters, and millions of items. Those are two very different things even though they are both "games".
Win : Loss ratio
It all kind of reminds me of this Magic The Gathering tournament I played in way back in the day. I beat my first opponent in a 2 out of 3 round situation, and then it was up to me to report back to the host that I had won my game. It was a sort of weird experience actually because I immediately started asking questions to myself like, "What happens when there's a dispute or someone just flat out lies about the outcome?" Clearly they have some type of protocol for that but I didn't actually care enough to follow it up any further than my own internal monologue.
So the question I would ask today would be the exact same one.
Can't we just play web3 games offline and trust that the outcomes of those games are reported correctly even though we can't monitor them? Weirdly enough I think the answer in at least some cases is: 'yes'. That would be pretty helpful within a decentralized system considering the exponentially higher cost that data represents within a web3 ecosystem.
How would that work?
Well a turn-based 1v1 card game would be the best and easiest example to give. First we create a version that works on-chain. Everyone is happy until the game gets actual adoption and now the entire chain is riddled with gameplay data. Should sound familiar as this exact story has already happened through Splinterlands.
So instead of keeping all that data on-chain we just port it off the chain. Not that big of a deal in a lot of circumstances. The reason for this is that even if those off-chain servers had a financial incentive to lie: lying would still be very difficult to do. Why? Because players are signing every move with their private key, and the game developers don't have access to any of those keys. So even though it's off chain the games themselves are nearly impossible to counterfeit by a third party.
If they say something happened and they can't explicitly prove it with an off-chain signature we could just assume they were lying. Kind of like Craig Wright claiming he's Satoshi Nakamoto. Everyone knows he's not because he doesn't have access to any of the keys. The burden of proof is on the person saying the thing. We can apply this same concept to off-chain storage.
So I win the game and I report to the host (Hive) that I won.
If my opponent tries to lie and tell Hive that they won... well my computer has all the receipts. I have a history of the entire game I just played stored locally on my own computer. I have every move that I played and every move the he played, which resulted in a victory for me. None of my opponents moves could possibly be forged unless I somehow had access to their private key, in which case we'd likely have much bigger problems on our hands than some random card game.
At the same time we must assume that during a dispute both parties would be required to hand over their data to some kind of arbiter. So unless my opponent can somehow forge a convincing victory they would automatically lose the dispute and be punished in some way for even trying to cheat. Even in the case that a believable forgery did exist it would be publicly obvious to the entire community that someone had found an exploit in the game design (two valid-looking claims that contradict each other). All the red flags would go up immediately. It likely wouldn't take long for someone to find out how this was accomplished and figure out a way to patch it.
Timeout
One of the more annoying and unsavory things sore losers can do is flip the board and refuse to play the game while running out the clock. The solution to this is pretty trivial when operating on a centralized system. If you're playing chess and your opponent's clock runs out: you automatically win. If you're playing Hearthstone and your opponent's clock runs out: you automatically get to start your next turn. Simple.
However, this is not so simple on a decentralized off-chain system. You could open a dispute on-chain claiming your opponent is stalling, but then your opponent could open a counter-claim saying, "That's a lie here's my next move," before getting disqualified. How annoying would it be when forced to open one of these trolling claims on every single move?
Luckily there could just be some kind of option to port the game back on-chain and finish it there publicly. Then none of the timestamps could be forged and all the complexities and potential pitfalls of the off-chain system could be circumvented. That's why it's so important that the game itself be playable on an unhackable data layer in the first place, even if the ultimate endgame goal is to take 99% of that data and move it off chain as it gains popularity.
Condensing a competitive game down to 1 op.
Imagine the average card game takes 50 actions before the game ends and one of the players wins. That's a lot of data for a blockchain to keep track of. With this in mind the ability to bring that data off chain really could reduce load by something like 99% depending on the complexity and length of the game. As long as we know who won and who lost then creating a ranking system with monetary rewards becomes a trivial task.
Read > Write
Is it easier to read, or is it easier to write? Hopefully this rhetorical question needs little explanation. Reading a book is easier than writing a book. Downloading is easier than uploading. Storing data is harder than broadcasting data that has already been stored.
So even though we potentially could be storing data off chain, that doesn't necessarily mean we shouldn't be keeping tabs on what is getting written to the ledger. Take the example above for instance. What happens if your opponent falsely accuses you of something and requests disqualification? If you don't have a data feed set up to get that on-chain information you could be cheated out of a win by a dirty dirty cheater.
Luckily Hive blocks (and all blocks really) are very small because again the data is backed up 1000 times over and relatively expensive. Getting a feed of all the bandwidth on Hive is only going to be like 20KB/sec maximum, and that includes everything. Blog posts, upvotes, the game, everything on the network. In fact a special API could be created just for the game which not only filters out non-game data but also will only send active players info which only applies to their account. A setup like that could basically scale to infinity just like a centralized one would (better actually because anyone could run one of these nodes).
Conclusion
Okay well that's the gist of it. The ironic future of crypto is likely a situation in which only the very most important data is saved to the chain because clutter is very expensive and inefficient. During these early iterations of blockchain we have been getting away with posting a lot of data that will one day likely be considered blatant spam.
Hell, maybe even this blog post would qualify as spam one day. It all depends on how scarce blockspace is combined with the actual market demand to use it. We already somewhat got a taste of this during the last bull market. How many EVM chains popped up out of nowhere once Ethereum started boasting absurd $200 operation fees? People have a way of fleeing to the cheapest solution, and for good reason. All we can do is try to maintain those inexpensive solutions and make sure they're good enough to actually deliver on their promises.
So how to we consolidate and archive that expensive data back off chain where it belongs? I suppose the root strategy is providing financial incentives that reward users for storing their proof locally and keeping other market participants honest. Easier said than done. The devil is in the details.
Return from How to store data off-chain while maintaining on-chain integrity. to edicted's Web3 Blog