Optimize multiple rows search in the PostgreSQL DB

Started by berto, Sep 14, 2022, 12:43 AM

Previous topic - Next topic

bertoTopic starter

How can I optimize the search process for a large number of strings in a database table? The task is to identify the missing rows in the DB table. The current method involves iterating through each line in a file and checking for its existence in the database using a SQL query. However, this approach takes a significant amount of time, especially when dealing with a large number of rows in the database.

$strings = file('string.txt');
     foreach ($strings as $string) {
         if (pg_fetch_array(pg_query("select count(id) from public.files where fileid ilike '%{$string}';"))[0] == 0) {
             echo "String {$string} not found in database<br>";



One way to compare two tables and identify the differences in their contents is to first store the data from each table in a file and then put these files into separate tables. After this, a comparison can be done using SQL queries.

However, it's important to note that managing databases and performing comparisons between them can be complex and require expertise in programming and database management. If you are not experienced with PHP and PostgreSQL, it may be worthwhile to seek out tutorials or consult with experts to ensure that you're implementing the comparison correctly and efficiently.


To optimize the search process for a large number of strings in a database table, there are several approaches you can consider:

1. Batch Processing: Instead of querying the database for each string individually, you can process the strings in batches. This involves grouping multiple strings together and querying the database once for each batch. This approach can significantly reduce the number of database queries and improve the overall performance.

2. Indexing: Ensure that the columns you are searching on, such as the "fileid" column in your example, are indexed. Indexing can enhance the search speed by creating a data structure that allows for faster lookup operations.

3. Pre-processing: If possible, preprocess the strings or data before performing the search. This may involve formatting or transforming the strings to match the database format. By doing this pre-processing step, you can minimize the amount of processing required during the search and make it more efficient.

4. Optimized Queries: Review your SQL query and ensure it is optimized. Make sure you are using the appropriate operators and functions to perform your search efficiently. Additionally, check if there are any alternatives or optimizations available specific to the database you are using.

5. Parallelism: Consider distributing the search process across multiple threads or processes to leverage parallelism. This can help in speeding up the search by utilizing multiple resources concurrently.

6. Caching: If your strings are relatively static, you can cache the results of previous queries to reduce the load on the database. This way, if a string has been searched before and found to be missing, you can avoid querying the database again.

7. Use Prepared Statements: Instead of dynamically constructing the SQL query for each string iteration, consider using prepared statements. Prepared statements allow you to prepare the query once and bind parameters for each string separately. This can improve performance by reducing parsing and optimization time for each query.

8. Use Bulk Inserts: If you have control over the data loading process, consider using bulk inserts instead of inserting one row at a time. Bulk inserts can significantly speed up the process by eliminating overhead associated with individual insert operations.

9. Load Data into Memory: If the dataset is not too large and fits into memory, you can load the relevant data into memory for faster searching. This approach is especially useful if you need to perform multiple searches on the same dataset.

10. Denormalization: If your schema allows it, consider denormalizing your data by storing additional redundant information that speeds up the search. This can involve duplicating certain columns or aggregating data into a different structure optimized for efficient querying.

11. Partitioning: If your database supports partitioning, consider partitioning your table based on certain criteria (e.g., range, list, or hash). Partitioning can distribute the data across multiple storage devices, allowing for parallel processing and improved query performance.

12. Use Search Indexing Technologies: Depending on your specific requirements, you can explore using search indexing technologies such as Elasticsearch or Apache Solr. These tools provide advanced features like full-text search, faceted search, and distributed search capabilities.

 Here are some examples to illustrate the optimizations mentioned earlier:

1. Batch Processing:

$strings = file('string.txt');
$batchSize = 100; // number of strings to process in each batch

for ($i = 0; $i < count($strings); $i += $batchSize) {
    $batch = array_slice($strings, $i, $batchSize);
    $placeholders = implode(',', array_fill(0, count($batch), '?'));
    $query = "SELECT COUNT(id) FROM public.files WHERE fileid ILIKE ANY (ARRAY[$placeholders])";
    $stmt = pg_prepare($connection, '', $query);
    $result = pg_execute($connection, '', $batch);
    while ($row = pg_fetch_array($result)) {
        if ($row[0] == 0) {
            echo "String {$row['fileid']} not found in database<br>";

2. Indexing:

Ensure that the `fileid` column in the `public.files` table is indexed using `CREATE INDEX` statement. For example:

CREATE INDEX idx_fileid ON public.files (fileid);

3. Pre-processing:

If possible, preprocess the strings from the `string.txt` file to match the format in the database before performing the search. For example, if the database stores strings in lowercase, you can transform the strings in `string.txt` to lowercase before querying the database.

4. Optimized Queries:

Review your SQL query and ensure it is optimized. Experiment with different operators and functions to perform efficient searching. For example, if you know that the strings in `string.txt` always start with a specific prefix, you can modify the query as follows:

$query = "SELECT COUNT(id) FROM public.files WHERE fileid ILIKE 'prefix%' AND fileid ILIKE ANY (ARRAY[$placeholders])";

By adding the prefix filter, you can reduce the number of rows that need to be scanned, improving the query performance.