How to get row counts for all tables in Oracle Database

Finding Row Counts for All Tables in Oracle Database

On my previous post “Table Statistics vs. SELECT COUNT: What to Choose?” I have mentioned two methods of finding Row Counts: Using Table Statistics and SELECT COUNT(*) from a non-technical, mostly business requirements perspective.

The debate centers around whether to rely on the table statistics or to compute the exact row count with “SELECT COUNT(*) FROM” of the table.

The main reason for questioning the accuracy of the table statistics is that they are often gathered using a sample (like ESTIMATE_PERCENT) or may be outdated and not current. If the table is not in a read-only state and serves an actively used application, the statistics can never truly be up-to-date.

On the other side, what’s wrong with computing the exact row count of each table?

The performance overhead alone is a sufficient reason to avoid it.

Worries arise when row counts are queried for all tables, not when only a single table.

Now, I aim to question this practice from a technical perspective to analyze for Oracle whether this really has such a tragic impact from a technical standpoint as Connor Mcdonald says.

I will use the sample schemas (HR,SH and CO) for testing in the Oracle 23ai Database running on a MacBook M1. You may refer to my blog post for “Install Oracle Database 23ai and Sample Schemas on Macbook M Series (Apple Silicon)“.

First, I will create a test table with exactly 22971075 rows to analyze whether table size impacts query execution statistics. The goal is to determine how many buffer gets are required to calculate the total row count for a large table. When retrieving the row count from dictionary statistics using DBA_TABLES, the actual table size is irrelevant since the query accesses metadata, not the table itself.

The gather_table_stats command after a Create Table as Select(CTAS) operation is no longer necessary with the introduction of the “Online Statistics Gathering” feature available starting in Oracle Database 12c. This feature automatically collects statistics during bulk operations such as BULK INSERT (APPEND), MERGE and CTAS (Create Table As Select), ensuring that statistics remain up-to-date without requiring a manual gather operation.

In Oracle Database 12c, online statistics gathering “piggybacks” statistics gather as part of a direct-path data loading operation such as, create table as select (CTAS) and insert as select (IAS) operations. Gathering statistics as part of the data loading operation, means no additional full data scan is required to have statistics available immediately after the data is loaded.

This behavior controlled with “_optimizer_gather_stats_on_load” parameter which is set to TRUE by default.

With just 20 buffer gets, we can retrieve the row count information for a single large table efficiently. But what happens when we add two more large tables each containing 20 million records? Not a surprise, even in this case, it’s not a big deal. Since all values are retrieved from the dictionary, the table size does not play a role. The NUM_ROWS column in DBA_TABLES reflects the row count from the last statistics gathering. Querying this dictionary view does not involve scanning the table, so buffer gets are minimal.

Retrieving the row counts for three tables does not result in a significant change in performance or resource usage compared to retrieving the count for a single table.

Determining the row counts for all tables in the sample schemas required minimal buffer gets, even when dealing with 28 tables, only three of which are significantly large.

Next, I will try the other method: dynamically executing SELECT COUNT(*) for each table and computing row counts instantly. While this method provides exact counts, it requires scanning the data directly, which may result in increased resource consumption, especially for larger tables. This comparison will help highlight the trade-offs between relying on table statistics and direct query execution for row count retrieaval.

It’s a common misconception that COUNT(1) might be faster than COUNT(*) because the latter seems to process an entire row. However, this is not the case. In this context, the * is not related to selecting all columns (as in SELECT *) and has no special meaning.

I will begin the testing with a single table, SH.BIG_TABLE to serve as a baseline for understanding how buffer gets are impacted by the table size and the method used for counting rows. As To ensure a full table scan (full table scan) when table row count is queried directly with count(*), I did not create an index. Once this baseline is established, I will expand the test to include multiple tables to evaluate overall performance overhead.

For a single large table, executing a dynamic query to calculate the row count resulted in 163258 buffer gets. In contrast, retrieving the count using table statistics required just 20 buffer gets. This shows a huge difference of nearly 8000 times more for the retrieving almost same information.

Now, let’s extend this test to three large tables to show the impact more clearly.

The calculation of 489779 ÷ 29 = 16888 reveals that the when tables are queried directly for finding row count, this method comes with an approximately 16000 times worse performance for this case.

SQL> set autotrace traceonly statistics
SQL> WITH function get_rows(
p_owner in varchar2, p_tablename in varchar2
) return number as result_rowcount number default NULL;
begin execute immediate 'select count(*)
from ' || '"' || p_owner || '"."' || p_tablename || '"' INTO result_rowcount;
return result_rowcount;
end;
select
owner,
table_name,
get_rows(owner, table_name) cnt
from
dba_tables
where
owner IN ('HR', 'SH', 'CO')
AND TABLE_NAME NOT LIKE 'MLOG$\_%' ESCAPE '\'
AND NVL(IOT_TYPE, 'NOT_IOT') NOT IN ('IOT_OVERFLOW', 'IOT_MAPPING')
AND TEMPORARY = 'N'
AND NESTED = 'NO'
AND SECONDARY = 'N'
AND EXTERNAL = 'NO'
AND HYBRID = 'NO' ;
Statistics
———————————————————-
28 recursive calls
0 db block gets
506704 consistent gets
493409 physical reads
0 redo size
1550 bytes sent via SQL*Net to client
133 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28 rows processed

I will also try using the XMLQUERY function, inspired by the idea from Solomon Yakobson on this link from forums.oracle.

XMLQUERY lets you query XML data in SQL statements. It takes an XQuery expression as a string literal, an optional context item, and other bind variables and returns the result of evaluating the XQuery expression using these input values.

SQL> set autotrace traceonly statistics
SQL>select
owner,
table_name,
XMLCAST(
XMLQUERY(
'/ROWSET/ROW/NUMROWS' PASSING DBMS_XMLGEN.GETXMLTYPE(
'select count(*) NUMROWS
from ' || '"' || OWNER || '". "' || TABLE_NAME || '"'
) RETURNING CONTENT
) AS NUMBER
) ROW_COUNT
FROM
DBA_TABLES T
WHERE
owner IN ('HR','SH','CO')
AND TABLE_NAME NOT LIKE 'MLOG$\_%' ESCAPE '\'
AND NVL(IOT_TYPE, 'NOT_IOT') NOT IN ('IOT_OVERFLOW', 'IOT_MAPPING')
AND TEMPORARY = 'N'
AND NESTED = 'NO'
AND SECONDARY = 'N'
AND EXTERNAL = 'NO'
AND HYBRID = 'NO' ;
Statistics
———————————————————-
140 recursive calls
0 db block gets
513727 consistent gets
493203 physical reads
0 redo size
1468 bytes sent via SQL*Net to client
133 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28 rows processed

The results are almost similar.

Additionally, the v$sql.buffer_gets column values verify the results we have observed so far. I will use the /*+ insaneDBA */ hint to easily identify my queries in the SQL area. I will then perform a simple calculation.

Overall, the total buffer_gets of 506953 corresponds to the SQL statement (5d91z601jy2xm) I executed. This total includes the sum of all buffer_gets from the query itself, subqueries running in the background (e.g., WITH clauses, PL/SQL function calls), recursive queries, dictionary lookups and other internal operations (dynamic sampling etc).

The buffer_gets for the subqueries executed behind the scenes using EXECUTE IMMEDIATE are detailed below. In the sample schemas, 28 tables result in 28 separate SQL statements (e.g., SELECT COUNT(*) FROM) being run in the background.

490162 + 16770 = 506932.

506953 – 506932 = 21 buffer gets are done by dictionary and internal calls.

Internal queries running background at exec
Internal queries running background

This approach uses table statistics to provide an initial estimate of row counts and then refines these estimates by incorporating information about recent changes captured through the DBA_TAB_MODIFICATIONS view. By combining these two data sources, it is possible to generate a near-real-time row count without incurring the high overhead of full table scans. This method is explained in detail in Martin Widlake’s nearly 15-year-old blog post “Counting the Cost #4 – The Speedy Way” which discusses the advantages of using a combination of dba_tables and dba_tab_modifications (DML tracking) to achieve speedy row count calculations.

Now i will also test all tables (28 tables) with his script. We will call dbms_stats.flush_database_monitoring_info procedure just before the test to Flush in-memory monitoring information for all the tables to the dictionary.

The Speedy Way appears almost perfect at first glance since it returns the number of rows with near real-time accuracy. It is significantly more accurate than Option 1 (Relying on Table Statistics to Get Row Counts for Each Table in an Oracle Schema) and achieves this with a minimal amount of buffer gets. However, it has some pitfalls. For instance, if you have a misbehaving application that frequently rolls back table modifications, the non-applied changes are still counted as inserts or deletes. Additionally, it’s worth noting that direct-path load activity is not recorded in the DBA_TAB_MODIFICATIONS view. Instead, such operations trigger table statistics collection through the Online Statistics Gathering feature (_optimizer_gather_stats_on_load).

Another method for estimating row counts is block sampling. The following example demonstrates a ten percent (10%) sampling approach. However, this method may be less accurate compared to other options if the segment blocks in the table have significantly varying numbers of rows.

Oracle Database 19c introduced real-time statistics to  extend online support to conventional DML statements. Because statistics can go stale between statistic collection tasks, real-time statistics help the optimizer generate more optimal plans.

The real-time statistics feature is only available on Engineered Systems such as Exadata and Exadata Cloud Service (as specified in the licensing manual). It is disabled by default in Oracle Database 19c RU10 on-premises Exadata systems, but it can be enabled using the optimizer_real_time_statistics parameter.

If you have Real-Time Statistics enabled, you may prefer to use dba_tab_statistics and rely on Real-Time Statistics when querying table statistics. Real-Time Statistics information is not displayed in the dba_tables dictionary view; instead, it can be accessed through dba_tab_statistics.

For more information, refer to Tim Hall’s “Real-Time Statistics in Oracle Database 19c” post.

You can further combine Martin Widlake’s Speedy Way (Option 3) with Real-Time Statistics, as outlined below

This analysis highlights the trade-offs between using table statistics and directly querying row counts with SELECT COUNT(*). While table statistics provide a quick and efficient way to estimate row counts with minimal resource consumption, their accuracy depends on how recently and comprehensively the statistics were gathered. On the other side, while direct row count queries offer precise results but come at a significant performance cost due to the need for full table scans, index full scans and index fast full scans, especially for large tables or when multiple tables are queried.

In our tests on Oracle 23ai with the sample schemas:

  • Retrieving row counts from table statistics required only 20–29 buffer gets, even for large tables.
  • Direct row count queries with SELECT COUNT(*) resulted in 163,258 buffer gets for a single table with 23 million records and up to 489,779 buffer gets for three large tables: an overhead of approximately 16,000 times compared to using num_rows value of dba_tables relying on table statistics.
  • Alternative methods, such as XML-based queries (XMLQUERY), showed results similar to “PL/SQL in the with clause” in terms of resource usage, without offering additional efficiency.
  • Other approaches, such as block sampling methodology, combining table statistics with table modifications, or even using real-time statistics with table modifications are also possible. However, they do not offer much additional benefit compared to Option 1 – Relying on Table Statistics.

This technical perspective demonstrates that, while direct row count queries can be useful in certain scenarios, their heavy performance impact makes them unsuitable for almost all operations, particularly in large-scale environments with tables containing millions of rows. Relying on up-to-date table statistics is the more practical choice for most applications, especially when performance are key priorities.

As a best practice, consider the following; Rely on table statistics to Find the number of rows in each table  in the database. Leverage Oracle’s features, like online statistics gathering (_optimizer_gather_stats_on_load), Real-Time Statistics (optimizer_real_time_statistics) to keep statistics accurate.

Hope it helps.


Discover More from Osman DİNÇ


Comments

Leave your comment