Concrete, privacy-by-design guidance for using blockchains in line with the GDPR. The guiding principle: keep personal data off the chain, put only verification artefacts on it, and choose cryptographic techniques deliberately so that data subjects gain privacy rather than lose it.
Whenever possible, no personal data should be written to the chain at all. Store the record relating to a data subject completely off-chain and put only a verification artefact (e.g. a hash or a commitment) on-chain. This keeps the on-chain footprint minimal and preserves the ability to delete or correct the underlying data off-chain.
Combining a hash with further data on the chain creates a linkage problem: because the original object relates to an identifiable person, the on-chain hash can then tie that person to everything stored alongside it. Hashing only the sensitive parts of a record still leaves the rest of the record linkable to the data subject — so a record must be stored off-chain as a whole.
A hash on a blockchain can leak data in three ways. Understanding them tells you when salting or peppering is needed.
Salt is a random key added per entry before hashing; it defeats parallel and rainbow-table attacks and ensures two equal inputs do not share a hash. It can be stored next to the hash. Pepper is a constant but secret key kept in a secure place; it additionally blocks rainbow-table attacks and makes brute force much harder. Unlike passwords, many on-chain objects already carry enough entropy, and a pepper that must be known to a larger group cannot stay secret.
A secret key (pepper) should be added if the hashing serves as an endorsement separate from the document itself. By deleting the secret key (pepper), this separate endorsement can be deleted. If the verification does not add additional information, it does not need to be separated by an additional key.
Commitments and zero-knowledge proofs allow — as do hashes — to separate the proof from the information itself. Unlike hashes, a zero-knowledge proof can even prove the possession of an information without having to reveal that information, while a hash requires the information itself to verify it against the hash.
Chameleon hashes allow editing hashes for those who know a secret key. This could allow invisible redactions. For some use cases — probably centralised use cases — this might be the appropriate technical architecture. See the analysis of Accenture's blockchain redaction approach: Forrester (Bennett, Matzke, Hoppermann, McPherson).
A blockchain might also include a pruning mechanism that automatically removes transaction information under certain conditions, e.g. after a certain time. A book-keeping blockchain could automatically erase all old blocks after 10 years. Any information that is required for a longer period needs to be copied before.
Decide deliberately between permissionless public and permissioned chains, and identify who — if anyone — determines purposes and means at the transaction, smart-contract and infrastructure level. The architecture choice drives both the privacy properties and the allocation of GDPR roles.
Where processing is likely to result in high risk, a data protection impact assessment is required — and even where it is not, it helps document necessity, proportionality and mitigations. Treat it as a design tool rather than a hurdle. See the DPIA section on the overview page. If you are considering a DPIA, I am happy to support you in carrying one out.
Cases from the BC4EU report where decentralised technology is used to strengthen — not weaken — the privacy of data subjects. Read the full report (PDF).
A smart contract stores only the hash of a diploma; a flag is added if it is revoked. Without revocation the hash is a mere verification artefact and reveals nothing about a person. The design enables independent, durable verification even after the issuer ceases to exist, limits the revocation information to those who hold a copy of the diploma, and — unlike a centralised revocation registry — leaves no trace of who checks a diploma, when or how often.
To prove that a custodian's liabilities are covered without exposing individual balances, account balances are hashed into a Merkle tree and a zero-knowledge proof attests the total of all liabilities. The Merkle root (optionally anchored on-chain) is only a verification artefact and not personal data; each customer can still verify that their balance is included — privacy by design instead of an auditor inspecting personal data.
Crypto-asset service providers must exchange originator and beneficiary data under the travel rule (FATF Rec. 16; in the EU the Transfer of Funds Regulation). With TRISA/TRP, the identity data is exchanged off-chain via mutually authenticated, encrypted, field-minimised messages (IVMS 101), while on-chain data stays pseudonymous and limited to what is technically required — a privacy-by-design implementation of a legal obligation.
From privacy-by-design architecture and DPIAs to evaluating an existing blockchain application — I advise at the intersection of technology and data protection law. Get in touch at joern@erbguth.net, or read the overview and the opinions.
This page provides general information, not legal advice, and does not create a client relationship.