Palavras finais

Durante os nove meses do projeto As Mil e Uma Noites, a equipe desenvolveu uma série de jogos, expressões e códigos que foram posteriormente utilizados no espetáculo a fim de possibilitar a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




What is what? clarifying terminologies for data privacy

So, I decided it could be a good idea to write a concise list of terms and definitions of techniques relevant to management of private data, so we avoid having the same (cute) expression of confusion like the photo above.

So, next time you think of anonymisation, think of this box:

In the financial domain, people speak more often about Tokenisation, rather than pseudonimisation. Tokenisation is:

And is mostly done over PAN (Primary Account Number).

Remember when you used to play in one of these arcade game centres, and you would get tokens instead of money to play? well, is pretty much the same.

There are mainly three ways to implement tokenisation:

Vault-based tokenisation: which uses a large database table to create lookup pairs that associate a token with the encrypted sensitive information

Vault-less tokenisation: where a token may be generated using the original data and a secret key or parameter that allows calculation of the data with the secret key and the token. Vault-less tokenisation does not require a database to store key value pairs, reducing the time required to complete a transaction that requires PAN recovery.

Stateless tokenisation: this one is a middle solution between Vault-based and Vault-less. It does not require to generate a database with lookup pairs, but the tokens are generated from a pre-defined, “static database” with pre-generated random values.

This one is often a cause of confusion and discussions, and within reason, since both concepts are quite intertwined..

Hashing is used for different things, like:

The hashing algorithm is called the hash function, and when it produces the same hash value from two different inputs, then we say that there is a collision problem with that function.

Ok, let’s dive more into details for Encryption… As mentioned before, encryption is used for reversible tokenisation, and for secure sending messages through the Internet or network. Data that has not been encrypted is known as plain text while encrypting data is known as a cipher text.

There is a plethora of algorithms, but here are some of the most famous ones:

There is a lot of discussion on dynamic masking, so I thought it would be good to touch upon the topic.. Basically a solution that implements dynamic masking changes the data stream so that the data requester does not get access to the sensitive data. For doing so, policies can be established to return an entire field tokenised or dynamically mask parts of a field in real-time depending of who is the data requester.

The important thing to keep in mind is that no physical changes to the original production data take place, and normally, there is no need of complex solutions (hashing, etc) because dummy values are used.

When to use anonymisation? when to use masking? when to use what? well… that depends of what your data privacy needs are. Here are a couple of questions you need to ask yourself (or the organisation you are working with), to help you deciding which techniques/options to go for:

If you need to recover the original data, you need pseudonimisation solutions

Stronger requirements require more sophisticated solutions, for example encryption, and weaker requirements can be implemented via codebooks or lookup tables.

If so, you may decide upon vault-less tokenisation or stateless tokenisation instead of vault-based. Also, some encryptions are more costly than others (symmetrical, asymmetrical or hybrid?) so that is another aspect to take into account.

If so, you may consider for example Format-Preserving Encryption.

The bottom line here is that only after you have understood the differences between the different concepts and techniques, and after identifying the data transformation needs, is that we can make an informed decision on how to treat sensitive data. As a rule of thumb, always make sure to follow these three steps:

I hope you found this article helpful on clarifying and consolidating some of the concepts that are essential for managing data privacy. If so, please clap and follow our data science blog posts!

Add a comment

Related posts:

6 Ways AI Will Transform Warehouse Management

AI has left the cinema and reached out into real life, from our homes, transportation and mobile devices, to the groundbreaking developments in business and Industry 4.0. Artificial Intelligence and…

ChatGPT and the new revolution

Unless you have been living under a rock by this point probably you have heard of ChatGPT. Just in case, let me summarize it for you. Chat GPT is a chat bot that uses Artificial Intelligence (AI)…

Become a Better Writer by Honing You Inner Detective

What makes a detective better than the rest? A great detective isn’t satisfied with the obvious, they look into what’s missing. They look for what’s not being said, seen, or heard. Do a quick google…