Frequently Asked Questions

What's the Difference Between "Typed" and "Flat" Blind Indexes?

If you need to store your indexes in a separate table, you want to use "typed" rather than "flat" indexes. If you're storing blind index values in the same table (i.e. just a new column on the same table), you'll want a "flat" index instead.

Typed: {"type": "somestring", "value": "somepseudorandomstring"}
Flat: "somepseudorandomstring"

That's literally the only difference. Use whichever is easier to work with.

In version 1 of the PHP implementation, the default was "typed". In version 2 and beyond (as well as other programming language implementations), the default is "flat".

When Would It Make Sense to Create More Than One Blind Index on a Field?

This is best explained through example.

Scenario: You're building software for a health care provider.

Imagine you need to store a person's HIV status (reactive = TRUE, nonreactive = FALSE, untested = NULL) for a medical database, and you want to encrypt this value.

Imagine that, also, at some point in the future, you will also need to query it (e.g. "find all untested patients in a specific region to schedule an STI screening panel").

Furthermore, imagine that you also use social security numbers for insurance paperwork and to deduplicate patients with the same first and last name (not super common, but does happen). You'll also want to be encrypting this value.

If you can easily imagine these operational requirements in the same organization, you can imagine the following blind indexes being created:

First Initial + Last Name
Last Four Digits of SSN
First Name + Last Four Digits of SSN
HIV Status + ZIP Code
HIV Status + Last Four Digits of SSN

You almost never want a boolean value (i.e. HIV status) to be directly indexed, because of its short domain size. You'll want to combine it with other fields depending on the kind of SELECT queries you're making.

Why Does a Blind Index Lookup Sometimes Find Rows that Don't Match My Input?

If you're using blind indexes as recommended, your database lookups will include a small degree of false positives that must be filtered out by your application code.

In order for blind indexes to be secure against sophisticated attackers, there has to be an allowance for false positives (but NOT for false negatives) in its design. The planners exist to make it easy to design your blind indexes to ensure this security property is achieved.

What's happening here is: There are a large number of possible inputs that, when run through the hash function (or appropriate backend function that powers blind indexes), produces a pseudorandom output that starts with the same bit pattern.

The shorter the output size of your blind index, the more coincidences (output prefix collisions) you'll encounter.

In order to make your software useful for end users, you need to apply some post-processing logic to the data returned by your database software.

For example (PHP code using EasyDB):

<?php
use ParagonIE\EasyDB\EasyDB;
use ParagonIE\CipherSweet\EncryptedRow;

/** @var EasyDB $dbh */
/** @var EncryptedRow $encRow */
$input = ['ssn' => '123-45-6789', 'hivstatus' => true];

// Get the blind index outputs:
$indices = $encRow->getAllBlindIndexes($input);

// Do a lookup (will contain false positives):
$unfiltered = $dbh->run(
    "SELECT * FROM contacts WHERE ssn_hivstatus_idx = ?",
    $indices['ssn_hivstatus_idx']
);

// Do some post-processing:
$return = [];
foreach ($unfiltered as $row) {
    // Decrypt the row:
    $decrypted = $encRow->decryptRow($row);
    
    // Do the SSN and hivstatus actually match?
    if (!hash_equals($decrypted['ssn'], $input['ssn'])) {
        continue;
    }
    if ($decrypted['hivstatus'] !== $input['hivstatus']) {
        continue;
    }
    
    // Append plaintext to array if we're still here:
    $return []= $decrypted;
}

// Now $return has all of the decrypted rows without false positives

It may be tempting to just choose arbitrarily long blind indexes to "avoid false positives", but doing so is risky!

If you do that, someone who looks at the database will be able to conclude when two rows have identical plaintext.

This undermines the security model of our library, and if you choose to do that, you do so at your own risk.

We strongly recommend that you do some application-layer post-processing instead. It's worth it, for keeping your users' data secure.