Data Filtering is a powerful new technique to ensure that sensitive data stays safely in the database. Many companies have already deployed Data Masking as part of their data protection strategy. In this post, we’ll review the key differences between data masking and dat filtering, highlighting strengths and weaknesses of each approach; and finally show how they can be deployed together.
At the heart of your enterprise – and most likely the final layer of protection in your security architecture – is your data. Your business relies on it to function; creates, updates and deletions need to be carefully protected; and the data must only be read by those with the rights so to do.
For most organisations, there is a subset of this data that is particularly sensitive – for business reasons; for regulatory compliance; or both. Many companies take an approach of protecting this data at the application layer – but increasingly there is a requirement to better protect at the data layer itself.
Dynamic Data Masking
Data masking is a technique whereby sensitive data is obscured or obfuscated in some way to render it ‘safe’. “Static” masking means that the data is changed in the database itself (and usually this means that the originally raw data cannot easily be retrieved, if at all); “Dynamic” data masking means that data is changed at the point it is requested.
A simple example is the masking of credit card data:
Raw data: 1234 5678 9101
‘Masked’ data: xxxx xxxx 9101
Whilst it’s possible – at least in some cases – to implement data masking either directly in the database itself, or at the application level, it’s most common to deploy data masking as a proxy which will intercept requests and then mask the dataset as it’s returned:
The more advanced masking products provide more intelligent masking operations: for example, rather than replacing credit card numbers with ‘x’, the number itself might be changed to another – known invalid – number.
Data Masking Challenges
Whilst data masking does protect data from being inappropriately released, there are a number of limitations:
- Can’t protect data from writes
- May return data in a format the application isn’t expecting (in the example above: letters instead of numbers; but mismatches can be a lot more subtle), which can cause unpredictable exceptions
- The existence of the data is still revealed
- Depending on the implementation, some products can only protect when a field is specifically named in the query – so a ‘select * from table’ might reveal data
The “Select *” problem is worth looking at in a little more detail.
We all know that “SELECT *” shouldn’t appear in production code. Unfortunately it happens all too frequently. Most often, it’s not a result of lazy programming, but rather because the developer is under pressure to ship the project, and for whatever reason the database design isn’t available. So an open-ended query gets thrown into the code and never gets cleaned up later. If you own the code for the app, then at least you have the opportunity to refactor it later – but there are plenty of examples like this either in 3rd party applications, or in custom code that you can’t easily change, for whatever reason.
For all these reasons: enter data filtering.
Axiomatics Data Filtering ensures that any sensitive data cannot be extracted from the database. It acts via a proxy to make a real-time change to the SQL query itself, based on policies that are defined centrally using standard XACML. (For an overview of how ADAF MD works, see this page.)
The key difference here is that data that shouldn’t leave the database, doesn’t leave the database. Take, for instance, the credit card example above; and image that the query is coded more like “SELECT * FROM PAYMENT WHERE USER = ‘JOE’”.
With filtering, at least the credit card number would be protected… but unfortunately, lots of other information is also being returned, and we are now trusting that the application will handle it sensitively (and that it’s not compromised en-route!).
With Data Filtering, we could write a policy such that only (say) an auditor with an accredited security clearance and working from within the HQ building could access fields other than the credit card number itself.
So, now, if any user who does not comply with these conditions, tried to execute “SELECT * FROM PAYMENT”, the system would detect an attempt to access fields other than the credit card number and return an empty list of rows to the user. Internally, the SQL statement would be rewritten to “SELECT * FROM PAYMENT WHERE 1 = 0”. But if the same user tried to execute a “SELECT CARDNUMBER FROM PAYMENT” statement, then applying the policy would result in a rewriten query of the form “SELECT CARDNUMBER FROM PAYMENT WHERE <condition>”, where <condition> determines which card numbers can be accessed by this particular user.
- Sensitive data never leaves the database unless explicitly allowed by policy
- Attribute-based (XACML) policies support even the most complex restrictions
- Applications can be protected, even if the code can’t be refactored
- The data that is returned is unchanged, reducing the chance of unexpected exceptions
Data Filtering and Data Masking
Although data filtering is much more powerful and flexible than data masking; there are still times where it might make sense to apply a data mask to the filtered results. In the credit card example above, for instance, even though the filter is ensuring that only the credit card number is being returned, we might still want to apply the mask to that credit card number before sending it back to the application.
In other words: data filtering and data masking do different, and often complementary jobs.
If you’d like more information, or if you have a need for advanced Data Filtering and/or dynamic data masking in your project, please get in touch.