The LSHSearch class -- This class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries. More...
Public Member Functions | |
LSHSearch (const arma::mat &referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500) | |
This function initializes the LSH class. | |
LSHSearch (const arma::mat &referenceSet, const arma::mat &querySet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500) | |
This function initializes the LSH class. | |
void | Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0) |
Compute the nearest neighbors and store the output in the given matrices. | |
Private Member Functions | |
double | BaseCase (const size_t queryIndex, const size_t referenceIndex) |
This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates. | |
void | BuildHash () |
This function builds a hash table with two levels of hashing as presented in the paper. | |
void | InsertNeighbor (const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) |
This is a helper function that efficiently inserts better neighbor candidates into an existing set of neighbor candidates. | |
void | ReturnIndicesFromTable (const size_t queryIndex, arma::uvec &referenceIndices, size_t numTablesToSearch) |
This function takes a query and hashes it into each of the hash tables to get keys for the query and then the key is hashed to a bucket of the second hash table and all the points (if any) in those buckets are collected as the potential neighbor candidates. | |
Private Attributes | |
arma::Col< size_t > | bucketContentSize |
The number of elements present in each hash bucket; should be secondHashSize. | |
arma::Col< size_t > | bucketRowInHashTable |
For a particular hash value, points to the row in secondHashTable corresponding to this value. | |
const size_t | bucketSize |
The bucket size of the second hash. | |
arma::mat * | distancePtr |
The pointer to the nearest neighbor distances. | |
double | hashWidth |
The hash width. | |
metric::SquaredEuclideanDistance | metric |
Instantiation of the metric. | |
arma::Mat< size_t > * | neighborPtr |
The pointer to the nearest neighbor indices. | |
const size_t | numProj |
The number of projections. | |
const size_t | numTables |
The number of hash tables. | |
arma::mat | offsets |
The list of the offset 'b' for each of the projection for each table. | |
std::vector< arma::mat > | projections |
The std::vector containing the projection matrix of each table. | |
const arma::mat & | querySet |
Query dataset (may not be given). | |
const arma::mat & | referenceSet |
Reference dataset. | |
const size_t | secondHashSize |
The big prime representing the size of the second hash. | |
arma::Mat< size_t > | secondHashTable |
The final hash table; should be (< secondHashSize) x bucketSize. | |
arma::vec | secondHashWeights |
The weights of the second hash. |
The LSHSearch class -- This class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries.
SortPolicy | The sort policy for distances; see NearestNeighborSort. |
Definition at line 59 of file lsh_search.hpp.
mlpack::neighbor::LSHSearch< SortPolicy >::LSHSearch | ( | const arma::mat & | referenceSet, | |
const arma::mat & | querySet, | |||
const size_t | numProj, | |||
const size_t | numTables, | |||
const double | hashWidth = 0.0 , |
|||
const size_t | secondHashSize = 99901 , |
|||
const size_t | bucketSize = 500 | |||
) |
This function initializes the LSH class.
It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done.
referenceSet | Set of reference points. | |
querySet | Set of query points. | |
numProj | Number of projections in each hash table (anything between 10-50 might be a decent choice). | |
numTables | Total number of hash tables (anything between 10-20 should suffice). | |
hashWidth | The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general. | |
secondHashSize | The size of the second hash table. This should be a large prime number. | |
bucketSize | The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. Default values are already provided here. |
mlpack::neighbor::LSHSearch< SortPolicy >::LSHSearch | ( | const arma::mat & | referenceSet, | |
const size_t | numProj, | |||
const size_t | numTables, | |||
const double | hashWidth = 0.0 , |
|||
const size_t | secondHashSize = 99901 , |
|||
const size_t | bucketSize = 500 | |||
) |
This function initializes the LSH class.
It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done.
referenceSet | Set of reference points and the set of queries. | |
numProj | Number of projections in each hash table (anything between 10-50 might be a decent choice). | |
numTables | Total number of hash tables (anything between 10-20 should suffice). | |
hashWidth | The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general. | |
secondHashSize | The size of the second hash table. This should be a large prime number. | |
bucketSize | The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. Default values are already provided here. |
double mlpack::neighbor::LSHSearch< SortPolicy >::BaseCase | ( | const size_t | queryIndex, | |
const size_t | referenceIndex | |||
) | [private] |
void mlpack::neighbor::LSHSearch< SortPolicy >::BuildHash | ( | ) | [private] |
This function builds a hash table with two levels of hashing as presented in the paper.
This function first hashes the points with 'numProj' random projections to a single hash table creating (key, point ID) pairs where the key is a 'numProj'-dimensional integer vector.
Then each key in this hash table is hashed into a second hash table using a standard hash.
This function does not have any parameters and relies on parameters which are private members of this class, intialized during the class intialization.
void mlpack::neighbor::LSHSearch< SortPolicy >::InsertNeighbor | ( | const size_t | queryIndex, | |
const size_t | pos, | |||
const size_t | neighbor, | |||
const double | distance | |||
) | [private] |
This is a helper function that efficiently inserts better neighbor candidates into an existing set of neighbor candidates.
This function is only called by the 'BaseCase' function.
queryIndex | This is the index of the query being processed currently | |
pos | The position of the neighbor candidate in the current list of neighbor candidates. | |
neighbor | The neighbor candidate that is being inserted into the list of the best 'k' candidates for the query in question. | |
distance | The distance of the query to the neighbor candidate. |
void mlpack::neighbor::LSHSearch< SortPolicy >::ReturnIndicesFromTable | ( | const size_t | queryIndex, | |
arma::uvec & | referenceIndices, | |||
size_t | numTablesToSearch | |||
) | [private] |
This function takes a query and hashes it into each of the hash tables to get keys for the query and then the key is hashed to a bucket of the second hash table and all the points (if any) in those buckets are collected as the potential neighbor candidates.
queryIndex | The index of the query currently being processed. | |
referenceIndices | The list of neighbor candidates obtained from hashing the query into all the hash tables and eventually into multiple buckets of the second hash table. |
void mlpack::neighbor::LSHSearch< SortPolicy >::Search | ( | const size_t | k, | |
arma::Mat< size_t > & | resultingNeighbors, | |||
arma::mat & | distances, | |||
const size_t | numTablesToSearch = 0 | |||
) |
Compute the nearest neighbors and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
k | Number of neighbors to search for. | |
resultingNeighbors | Matrix storing lists of neighbors for each query point. | |
distances | Matrix storing distances of neighbors for each query point. | |
numTablesToSearch | This parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered. |
arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketContentSize [private] |
The number of elements present in each hash bucket; should be secondHashSize.
Definition at line 235 of file lsh_search.hpp.
arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketRowInHashTable [private] |
For a particular hash value, points to the row in secondHashTable corresponding to this value.
Should be secondHashSize.
Definition at line 239 of file lsh_search.hpp.
const size_t mlpack::neighbor::LSHSearch< SortPolicy >::bucketSize [private] |
The bucket size of the second hash.
Definition at line 225 of file lsh_search.hpp.
arma::mat* mlpack::neighbor::LSHSearch< SortPolicy >::distancePtr [private] |
The pointer to the nearest neighbor distances.
Definition at line 242 of file lsh_search.hpp.
double mlpack::neighbor::LSHSearch< SortPolicy >::hashWidth [private] |
The hash width.
Definition at line 216 of file lsh_search.hpp.
metric::SquaredEuclideanDistance mlpack::neighbor::LSHSearch< SortPolicy >::metric [private] |
Instantiation of the metric.
Definition at line 228 of file lsh_search.hpp.
arma::Mat<size_t>* mlpack::neighbor::LSHSearch< SortPolicy >::neighborPtr [private] |
The pointer to the nearest neighbor indices.
Definition at line 245 of file lsh_search.hpp.
const size_t mlpack::neighbor::LSHSearch< SortPolicy >::numProj [private] |
The number of projections.
Definition at line 204 of file lsh_search.hpp.
const size_t mlpack::neighbor::LSHSearch< SortPolicy >::numTables [private] |
The number of hash tables.
Definition at line 207 of file lsh_search.hpp.
arma::mat mlpack::neighbor::LSHSearch< SortPolicy >::offsets [private] |
The list of the offset 'b' for each of the projection for each table.
Definition at line 213 of file lsh_search.hpp.
std::vector<arma::mat> mlpack::neighbor::LSHSearch< SortPolicy >::projections [private] |
The std::vector containing the projection matrix of each table.
Definition at line 210 of file lsh_search.hpp.
const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::querySet [private] |
Query dataset (may not be given).
Definition at line 201 of file lsh_search.hpp.
const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::referenceSet [private] |
Reference dataset.
Definition at line 198 of file lsh_search.hpp.
const size_t mlpack::neighbor::LSHSearch< SortPolicy >::secondHashSize [private] |
The big prime representing the size of the second hash.
Definition at line 219 of file lsh_search.hpp.
arma::Mat<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::secondHashTable [private] |
The final hash table; should be (< secondHashSize) x bucketSize.
Definition at line 231 of file lsh_search.hpp.
arma::vec mlpack::neighbor::LSHSearch< SortPolicy >::secondHashWeights [private] |
The weights of the second hash.
Definition at line 222 of file lsh_search.hpp.