Migrating AWR Warehouse Repository to a new OEM Target

AWR Warehouse is tightly integrated with Enterprise Manager, both the database which acts as the repository plus any databases that will contribute to it need to be discovered as targets within OEM before they can be used. If you are already using OEM to manage your databases (and you should be), this means that configuration is as simple as selecting the appropriate instance from a drop-down list of discovered targets; however it does have some disadvantages.

One of these disadvantages is that if the Enterprise Manger target of the repository database changes, then AWR Warehouse configuration is lost with it. The target could be changed due agent problems on the host forcing you to re-discover it. Even though it is the same database it is treated it as a new target.

If you attempt to re-configure by pointing it at the new target, a new AWRW instance is created (note the EM_ID field on many of the internal tables refers to this instance id). It is possible to re-add all source databases to this new instance, however it involves a lot of manual steps manipulating the internal tables to get everything working.  While I was researching this with a view to providing a step by step guide I found an easier way.

I will reproduce this situation, and walk through my solution. My AWRW repository is called OMR, and I have a database DGA contributing to it.

I remove instance OMR from OEM, and then re-add it.
When I go to the AWR Warehouse page, I can still see my source database is still listed, but note the error message:
Screen Shot 2017-04-17 at 11.55.06 AM

To fix this, first identify the OEM GUID of the newly configured repository database:

 
SQL> SELECT
  2      target_guid
  3  FROM
  4      sysman.mgmt$target
  5  WHERE
  6          target_name = 'OMR'
  7      AND
  8          target_type = 'oracle_database';

TARGET_GUID
--------------------------------
4CAF735767113C73E0530201A8C0FFB8

Then update the AWRW repository table in OEM with this new GUID:

SQL> UPDATE SYSMAN.DB_CAW_REPOS_E SET TARGET_GUID = '4CAF735767113C73E0530201A8C0FFB8';
1 row updated.

SQL> COMMIT;

Commit complete.

Doing this seems to bypass the re-configuration process. I have tested that everything is working by performing “Upload Snapshots Now”, and all seems fine, however I haven’t performed extensive testing, and I would suggest exercising the usual caution before updating internal OEM tables.

AWR Warehouse and Data Guard – Conclusions

Unfortunately AWR Warehouse and Data Guard do not work together seamlessly, however by understanding some of the implementation details, and following some simple steps you can help to mitigate most problems that may be encountered.

It is not possible to add an instance to AWRW when it is in standby mode, worse still if you do configure the primary and then switch over to an instance not configured in AWRW, the extract jobs will continue to dump files to local storage, but those files will never be transferred or loaded.

For this reason, I would suggest if setting up a pair of instances in data guard configuration from scratch to consider adding both to AWRW as part of initial configuration process. If this is an existing system where you cannot switchover to add the standby to AWRW, you should consider adding it the next time you do switchover.

Whenever you add a standby instance to AWRW make sure to backup the extract metadata table beforehand, and restore it before performing the first extract (remember to fix the sequence on this table too), to make sure you do not extract the same snapshots from multiple instances.

While it is possible to update the mapping table to force AWRW to upload from both instances to the same DBID, I think it is not worth the risk you run of breaking things by performing such changes. The other downside is that removal of either instance from AWRW triggers the removal of AWR history from both instances from the repository.

The downside of having the different instances of the same database mapped to different DBIDs is extra complexity in queries you run against the AWRW for analysis, especially if you switchover for significant periods of time. In addition, AWR reports or ASH analytics will not be work on consolidation of both instances, you will have to perform those tasks against each instance separately.

In addition as long as you are on DB plugins 13.2.2 or later (and if you are not you should think about upgrading), you may also consider updating the properties file such that data is extracted to a location accessible from both instances. That way files extracted from one instance will still get uploaded from the other after switchover. If you do not, you may need to perform some manual steps to ensure snapshots are not missed during switchover.

AWR Warehouse and Data Guard (Part Five)

Continuing the investigation, I wanted to check the effect of deleting one of the two data guard instances from AWRW; remember I have updated the mapping table such that both DGA and DGA both map to the same DBID.

We can check from load metadata and AWR itself which snapshots have been loaded

SQL> SELECT
    dump_id,
    dbid,
    target_name,
    begin_snap_id,
    end_snap_id
FROM
    dbsnmp.caw_load_metadata
ORDER BY dump_id;  

   DUMP_ID     DBID TARGET_NAM BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ---------- ------------- -----------
    14 1852644603 DGA            12        38
    15 1852644603 DGB            39        41
    16 1852644603 DGB            42        44

SQL> SELECT
    MIN(snap_id),
    MAX(snap_id),
    COUNT(*)
FROM
    dba_hist_snapshot
WHERE
    dbid = 1852644603
ORDER BY snap_id;  

MIN(SNAP_ID) MAX(SNAP_ID)   COUNT(*)
------------ ------------ ----------
      12           44      33

So we have 33 snapshots, loaded from both DGA and DGB.
I then delete DGB from AWRW via the GUI. After waiting a few minutes for the scheduler task that actually does the deletion to kick in, I recheck the AWR data.

SQL> SELECT
    MIN(snap_id),
    MAX(snap_id),
    COUNT(*)
FROM
    dba_hist_snapshot
WHERE
    dbid = 1852644603
ORDER BY snap_id;
  
MIN(SNAP_ID) MAX(SNAP_ID)   COUNT(*)
------------ ------------ ----------
                   0

Because databases are mapped to the same DBID, all AWR history for both databases has been removed from AWR Warehouse.

One final thing I wanted to check was that with the updated mapping record, AWRW data was really accessible via OEM.

I re-added DGB back to AWRW, mapped to its real DBID using the steps before.
From the AWRW page I highlight DGA and select “Compare Periods->Compare Periods Report”. (Note this is a powerful feature of AWRW probably deserving of it’s own entry):

compare_periods

I make sure AWR Data Source is “Historical – AWR Warehouse”:

select_snapshot

I then check I can pick a pair of uploaded snapshots, in this case 49 and 50.
For the second period I can select a different database. Note in particular how this screen shows the AWRW mappings  I am then able to pick the exact same snapshots, 49 and 50, for DGB.
I am able to run the AWR comparison report for the same snapshots, but associated with the different databases.

AWR Warehouse and Data Guard (Part Four)

After a brief interlude I wanted to follow up on something I noticed on John Hallas’s blog.

He mentioned that specifying a shared location for the extract metadata did not work, and was waiting on a fix for bug number 21826657. I checked that this bug should be fixed in db plugins version 13.2.2 and that my test hosts dga and dgb are both running this version.

To test whether the fix was effective, I first updated the extract properties table to specify a location  shared between both hosts, /home/oracle/awrw, generated a couple of snapshots (current primary is DGA) and then ran an extract process.

SQL> update dbsnmp.caw_extract_properties
  2     set property_value = '/home/oracle/awrw'
  3     where property_name = 'dump_dir_1';

1 row updated.

SQL> commit;

Commit complete.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbsnmp.mgmt_caw_extract.run_job_now;

PL/SQL procedure successfully completed.

I double-check the files are extracted to /home/oracle/awrw as expected. You can see from the filename that this corresponds to snapshots with ids between 42 and 44.

[oracle@dgb awrw]$ ls -alrt /home/oracle/awrw
total 12796
drwx------. 20 oracle oinstall     4096 Apr  8 12:21 ..
drwxr-xr-x.  1 oracle oinstall      136 Apr  9  2017 .
-rw-r-----.  1 oracle oinstall 13066240 Apr  9  2017 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp
-rw-r--r--.  1 oracle oinstall    31656 Apr  9  2017 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.log
[oracle@dgb awrw]$

I then switch-over to DGB and then performed “Upload Snapshots Now”.
Reviewing the steps for the Transfer Task looks promising, below are the outputs of a couple of the transfer subtask; first of all createArchive:

conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dgb.localdomain)(PORT=1521)))(CONNECT_DATA=(SID=DG))), user: system, role: normal
Connected to database
Number of files ready for transfer: 1
Maximum total size of dump files: 5368709120
file: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp size(bytes): 13066240
Number of files picked up for transfer: 1
Updated status of dump files, ready to transfer
Creating archive file using [/app/oracle/agent/agent_13.2.0.0.0/bin/zip -v 4CA58C2B03FF5F03E0530201A8C0052E.zip 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp]
  adding: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp .	(in=13066240) (out=1116898) (deflated 91%)
total bytes=13066240, compressed=1116898 -> 91% savings
Created archive 4CA58C2B03FF5F03E0530201A8C0052E.zip
Disconnected from database

Next prepareDestinationFiles:

Extracting dump files from /app/oracle/agent/agent_inst/awr_t/4CA58C2B03FF5F03E0530201A8C0052E.zip
Extracted 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp
extract complete
Deleted archive 4CA58C2B03FF5F03E0530201A8C0052E.zip
conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.2)(PORT=1521)))(CONNECT_DATA=(SID=OMR))), user: sys, role: sysdba
Connected to database
dumpfile: 1_4CB89422459F0DF3E0530401A8C065D5_1852644603_42_44.dmp, emid: 1, dbid: 1852644603, bid: 42, eid: 44, size: 13066240
source target: DGB, type: oracle_database
Added dump files to the import queue
Disconnected from database

The output of the loadAWR task reveals that 3 snapshots have been successfully loaded to AWRW:

conn_url: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.2)(PORT=1521)))(CONNECT_DATA=(SID=OMR))), user: sys, role: sysdba
Connected to database
Loaded 3 snapshots in the AWR Warehouse.
Disconnected from database

Reviewing load metadata you can see that the snapshots that were extracted from DGA have been successfully loaded from DGB.

SQL> select dump_id, target_name, begin_snap_id, end_snap_id, status from dbsnmp.caw_load_metadata order by dump_id;

 SQL> select begin_snap_id, end_snap_id, target_name, status from dbsnmp.caw_load_metadata order by dump_id;

	   12	       38 DGA		      3
	   39	       41 DGB		      3
	   42	       44 DGB		      3
SQL>

So it seems that it is now OK to update the location that the dumpfiles are extracted to. If this is a shared location accessible by both hosts in dataguard configuration, then as we have proved, any data extracted from one host, can subsequently be transferred and loaded from the another host without requiring manual intervention.
This may be of use with AWRW in a RAC configuration (something I haven’t yet investigated).

AWR Warehouse and Data Guard (Interlude)

Note, in preparing next entry, I reviewed the results from previous entries and found that although extracts were being transferred successfully they had all failed to load; I should have noticed they had status 4 (IMPORT_FAILED), in the load metadata table.

Reviewing dbms_scheduler_job_run_details of the load task revealed the following error:

"move error: 
ORA-20105: Unable to move AWR data to SYS
1852644603 1_4CB614B101EE2E5EE0530401A8C066F0_1852644603_12_37.dmp 1 DGA oracle_database 1852644603"

I removed both databases from ARWR, and playing around with the internal functions used to register and un-register the databases revealed that the partitions had got in a mess.

 
SQL> exec dbms_swrf_internal.unregister_database(1852644603);
SQL> exec dbms_swrf_internal.register_database(1852644603);
BEGIN dbms_swrf_internal.register_database(1852644603); END;

*
ERROR at line 1:
ORA-13509: error encountered during updates to a AWR table
ORA-14012: resulting partition name conflicts with that of an existing partition
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 105
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 184
ORA-06512: at line 1

To resolve this, I unregistered the database again using the internal function, then identified the ‘orphan’ partitions causing problems using the following SQL (1852644603 is the DBID of the problem database):

 
SELECT * FROM DBA_TAB_PARTITIONS WHERE PARTITION_NAME LIKE '%1852644603%'

Partitions for the unregistered database still existed for the following tables:

  • WRH$_CON_SYSMETRIC_HISTORY
  • WRH$_CON_SYSSTAT
  • WRH$_CON_SYSTEM_EVENT
  • WRH$_CON_SYS_TIME_MODEL
  • WRM$_PDB_IN_SNAP

I then dropped these problem partitions (this is just a play database, that is why I am able to be so reckless modifying internal Oracle tables, do you own research and testing before making such changes, and make sure you understand the risks.)

After I did this, I was able re-add DGA to ARWR and upload some snapshots successfully; this time I actually checked the records existed in AWRW using the following query:

SQL> SELECT DISTINCT DBID FROM DBA_HIST_SNAPSHOT;

1585153544
1852644603
SQL> 

I performed the same steps as before (backup extract metadata table, add DGB to AWRW, switchover to DGB, change the mapping record for DGA, restore the extract record, Upload Snapshots Now, then restore the mapping record for DGA). Snapshots from DGB also uploaded successfully with the original DBID>

It distinctly possible that I messed up the partitioning of the AWR tables by my hacks however there are a few things that make me think this is not the case. The original upload from DGA failed (this was before I had made any non-standard changes). Also repeating my steps after manually fixing the partitions, everything worked fine. It is also worth noting that the tables causing the problem are all new in 12.2 (DGA and DGB are fresh installed 12.2 instances, the AWR Warehouse repository is 12.2 upgraded from 12.1). We also had some similar issues on our production AWR Warehouse after we upgraded it from 12.1 to 12.2 (a different error, but also related to AWRW partitions). So I suspect some bug where AWRW is not properly handling partitions on new 12.2 AWR tables.

Messing around with the register/un-register functions also reminds me of another thing to be aware of regards my ‘hack’ on the CAW_DBID_MAPPING table. If you subsequently remove the either database from AWRW (via the GUI) it will also remove all records from AWR history from both databases. This may or may not be what you want, you should be aware of it.

AWR Warehouse and Data Guard (Part Three)

This entry continues with the setup configured in parts one and two. I stated that ideally you should perform “Upload Snapshots Now” on the primary database before performing switch-over to prevent remove the possibility of missing snapshots in AWRW. We will see what happens if you don’t, either because you forgot, or it was not possible due to fail-over, and review some options to resolve the situation.

Do your own due diligence before updating any internal AWR data, be prepared to accept the consequences if you break things.

Primary database is currently DGA, I generate a couple of snapshots and perform a manual extract (note during the normal course of operations, this extract runs independently of the transfer and load components, so it is possible for there to be snapshots which have been extracted, but not transferred to AWRW host). The extract metadata table shows one extraction with status 2, the data has been extracted but has not yet been transferred to AWRW.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbsnmp.mgmt_caw_extract.run_extract;

PL/SQL procedure successfully completed.

SQL> select extract_id, begin_snap_id, end_snap_id, status from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID      STATUS
---------- ------------- ----------- ----------
      1            1       10           5
     22           11       13           5
     41           14       15           5
     61           16       18           5
     81           19       24           2

I perform switch-over so that DGB is primary. Because the extract metadata references a file on host dga, the transfer task on dga will fail (as the database is in standby mode), and the one on dgb will also fail because the extract file is actually on dga. As no additional extracts have taken place, this ‘orphan’ extract record can be deleted (backup the table first if you are playing it safe). The next extract on DGB will re-extract the missing snapshots.

SQL> delete from dbsnmp.caw_extract_metadata where extract_id = 81;

1 row deleted.

SQL> commit;

Commit complete.

Sure enough, when I run “Upload Snapshots Now” against DGB, the job succeeds, and reviewing the load metadata table reveals that the snapshots initially extracted from DGA have been re-extracted successfully from DGB (along with some other snapshots that have since taken place).

SQL> select dump_id, target_name, begin_snap_id, end_snap_id, status from dbsnmp.caw_load_metadata order by dump_id;

   DUMP_ID TARGET_NAM BEGIN_SNAP_ID END_SNAP_ID     STATUS
---------- ---------- ------------- ----------- ----------
	 4 DGA			  1	     10 	 4
	 5 DGB			 11	     13 	 4
	 6 DGB			 14	     15 	 4
	 7 DGA			 16	     18 	 4
	 8 DGB			 19	     28 	 4

However if you don’t do this in time,  subsequent snapshots may get extracted, transferred and loaded successfully creating a gap in the snapshots held in AWRW. To reproduce this situation, I create some more snapshots and kick of the extract process.
Notice this time that the run_extract doesn’t actually do anything (in the output we can see that as it has run recently so it doesn’t actually do anything). This is part of the design of the ETL process to have minimal impact on the source database. I override this behavior by calling the procedure run_job_now.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> set serveroutput on size unlimited
SQL> exec dbsnmp.mgmt_caw_extract.run_extract;
Starting CAW Extract...
Fetched CAW properties.
Retrying most recent failure...
No recent failure to retry.
Preparing for new Extract...
Upload interval is 24 hr.
Time elapsed since last successful extract: 0 hr.
Not enough time elapsed. Quitting.
No non-recoverable extracts to retry.

SQL> exec dbsnmp.mgmt_caw_extract.run_job_now;
run_job_now completed.

PL/SQL procedure successfully completed.

SQL> select extract_id, begin_snap_id, end_snap_id, status from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      2

6 rows selected.

I then switch-back (primary reverts to DGA), create a couple more snapshots and perform “Upload Snapshots Now”. The process completes successfully, however checking the load metadata reveals that some snapshots have indeed been missed.

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAM	 DBID BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ------------- -----------
DGA	   1852644603		  1	     10
DGB	   1852644603		 11	     13
DGB	   1852644603		 14	     15
DGA	   1852644603		 16	     18
DGB	   1852644603		 19	     28
DGA	   1852644603		 31	     32

6 rows selected.

Checking the extract metadata in reveals the status of this extract is 5 (transferred successfully). (I don’t know why; the load metadata showed that it has not been received into AWRW).

SQL> select extract_id, BEGIN_SNAP_ID, END_SNAP_ID, STATUS from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      5
       141	      31	  32	      5

7 rows selected.

To resolve this situation, I update the extract metadata record to error status, to force these snapshots to be are re-extracted.

SQL> update dbsnmp.caw_extract_metadata set status = 3, e_retry_count = 0 where extract_id = 121;

1 row updated.

SQL> commit;

Commit complete.

After running Upload Snapshots Now, reviewing the extract metadata reveals that the missing snapshots have been re-extracted (the record has been updated to status 5).

SQL> select extract_id, BEGIN_SNAP_ID, END_SNAP_ID, STATUS from dbsnmp.caw_extract_metadata order by extract_id;

EXTRACT_ID BEGIN_SNAP_ID END_SNAP_ID	 STATUS
---------- ------------- ----------- ----------
	 1	       1	  10	      5
	22	      11	  13	      5
	41	      14	  15	      5
	61	      16	  18	      5
       101	      19	  28	      5
       121	      29	  30	      5
       141	      31	  32	      5

7 rows selected.

Reviewing load metadata reveals that the missing snapshots have finally been loaded successfully into AWRW, the gap has been eliminated.

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAM	 DBID BEGIN_SNAP_ID END_SNAP_ID
---------- ---------- ------------- -----------
DGA	   1852644603		  1	     10
DGB	   1852644603		 11	     13
DGB	   1852644603		 14	     15
DGA	   1852644603		 16	     18
DGB	   1852644603		 19	     28
DGA	   1852644603		 31	     32
DGA	   1852644603		 29	     30

7 rows selected.

In summary even though out of the box, AWR Warehouse does not support data-guard configuration, with a little bit of hacking it is possible to get it to work reasonably well.

Here’s hoping that in a later release these issues will have been worked out, and it will work out of the box.

Reasons to love AWR Warehouse

I realize that the majority of blog posts I have made are for problems I have encountered. I think is only natural as most of my time is spent investigating the few occasions where things don’t work as expected rather than the 99.99% of the time when things just work. Even when there are problems the code is normally very well instrumented, and there is such a wealth of great technical information available, troubleshooting issues can be almost enjoyable.

With a view to re-balance this state of affairs somewhat, I thought I would put down a few thoughts on AWR Warehouse. I am continually surprised that so few people seem to be using it, so I will try to do my part address this, and hopefully persuade you to give it a shot. If you are not using it, feel free to share via a comment why not; if you are using it also feel free to share any thoughts, good or bad about the product.

Licensing

One of the main reasons to consider using AWRW can be found in the Database Licensing Information User Manual. I quote the relevant section:

A separate single instance Oracle Database can be installed and used as an infrastructure repository for RMAN, Oracle Enterprise Manager Cloud Control, Automatic Workload Repository (AWR) Warehouse, Global Data Services Catalog, and Grid Infrastructure Management Repository without additional license requirements, provided that all the targets are correctly licensed. It may not be used or deployed for other uses.

The infrastructure repositories for RMAN, Oracle Enterprise Manager Cloud Control, AWR Warehouse, and Global Data Services Catalog can be shared in one database, or deployed in separate databases. Enterprise Edition must be used for the infrastructure repository database(s).

If Oracle RAC is used for the infrastructure database, then Oracle RAC licenses for all servers, and Enterprise Edition licenses for all but the first server, are required.

If Data Guard is used for the infrastructure database, then an Enterprise Edition license for the standby server is required.

So paraphrasing (note I am neither a lawyer nor a licensing expert), as long as the source databases are correctly licensed,  you don’t use RAC or DataGuard, and you don’t use it for anything else, you don’t need to worry to worry licensing on AWR Warehouse Repository.  This can (in fact must be) Enterprise Edition as it makes extensive use of the partitioning option.

There are not many occasions when Oracle corporation gives stuff out for free, when they do I suggest to take advantage.

The licensing advantages become even more attractive when you realize that AWRW actually saves you licensing on source databases by allowing you to offload AWR data storage and analysis from the source database.

Note, don’t even think about using an existing database for this purpose, especially not the OEM Repository database; trust me, on this, you’ll thank me later.

Tight Integration into Enterprise Manager

Database administrators  that have not embraced Enterprise Manager (OEM) for managing their databases, may consider this disadvantage, however I believe that all companies that manage more than a handful of databases, should be fully utilizing the power of OEM; its benefits far outweigh any overhead in administration.

AWRW initial configuration and the addition of database sources is basically a case of making sure they are discovered within OEM, defining preferred credentials, and then clicking a few buttons.

Once databases are contributing to AWRW, making use of the historical data is again just a case of clicking a few buttons.

AWR reports can be easily run for data in AWRW rather than just local AWR data, comparisons between different databases from at the same, or different, periods of time can be made.

ASH analytics is another area where the power of AWRW and OEM combine to provide some powerful tools for in-depth analysis.

Database Versions

The AWRW documentation states that:

(The AWR Warehouse Repository) must be an equal or higher database version of the source databases it accommodates.

You might consider this a disadvantage, however I see it differently. It is an opportunity to try out upgrading and using new database versions with non-trivial (actually substantial) volumes of data before rolling out to databases where developers or users complain in case things go wrong.

We recently upgraded our AWRW Repository to 12.2.  We had a few issues (as expected), but we could take our time resolving them, because we (the DBAs) were the only ones who noticed any outage.  After this upgrade completed, we then felt brave enough to upgrade our OMR Repository database which went through much more smoothly.

Of course perform due diligence, practice on a test system, and make sure you have backups  which will allow you to roll back in case things go wrong.

Summary

It’s license free, extends the power of Enterprise Manager, and gives you low-risk exposure to new database versions.

Why aren’t you using it?  Seriously;  let me know via comments.

AWR Warehouse and Data Guard (Part Two)

We will continue with setup configured for the previous blog entry.
This post includes details about updating some internal AWRW tables. These steps are  provided for education purposes only; if you do this on a real system make sure you know exactly what you are doing and understand the risks. If you break anything, Oracle Support or I may not be able or willing to help you.

First I delete the two instances DGA and DGB previously configured from AWR Warehouse with a view to re-configuring them to resolve some of the problems we encountered last time.

I re-add DGA and perform “Upload Snapshots Now” and make sure an entry is created in the DBID mapping table.

SQL> SELECT target_name, old_dbid, new_dbid FROM DBSNMP.CAW_DBID_MAPPING order by mapping_id;

TARGET_NAME   OLD_DBID	 NEW_DBID
----------- ---------- ----------
cawr	    1585153544 1585153544
DGA	    1852644603 1852644603

I switch-over to instance DGB. Before re-adding DGB to AWR Warehouse, I backup the extract metadata table.

SQL> create table dbsnmp.caw_extract_metadata_backup as select * from dbsnmp.caw_extract_metadata;

Table created.

I then re-add DGB to AWRW, but before uploading snapshots I update the mapping for DGA temprarily to set new_dbid column to a different value 1.

SQL> update DBSNMP.CAW_DBID_MAPPING set new_dbid=1 where target_name='DGA';
1 row updated.
SQL> COMMIT;
Commit complete.

I then verify that caw_extract_metadata is empty (it has been  recreated when adding DGB to AWRW) and restore the record from my backup table. This should fool AWR extract process into realizing which snapshots have already been uploaded from DGA. I also update the relevant sequence so there will not get an error when creating  a new extract metadata record.  I create a couple of snapshots for testing.

SQL> select count(*) from dbsnmp.caw_extract_metadata;
  COUNT(*)
----------
	 0
SQL> insert into dbsnmp.caw_extract_metadata select * from dbsnmp.caw_extract_metadata_backup;
1 row created.
SQL> commit;
Commit complete.
SQL> select dbsnmp.caw_extract_id_seq.nextval from dual;
21
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.

Finally I perform “Upload Snapshots Now” on DGB.  The process completes successfully.  Let’s check the contents of the dbid mapping table:

SQL> select mapping_id, target_name, old_dbid, new_dbid from DBSNMP.CAW_DBID_MAPPING;
MAPPING_ID TARGET_NAME	 OLD_DBID   NEW_DBID
---------- ----------- ---------- ----------
	 1 cawr        1585153544 1585153544
	 4 DGA	       1852644603	   1
	 5 DGB	       1852644603 1852644603

Because we had temporarily re-mapped DGA, OEM has not had to remap the dbid for DGB.  I now revert the  change we made to DGA mapping and review the mappings:

SQL> update DBSNMP.CAW_DBID_MAPPING set new_dbid = 1852644603 where target_name = 'DGA';
1 row updated.
SQL> commit;
Commit complete.
SQL> select mapping_id, target_name, old_dbid, new_dbid from DBSNMP.CAW_DBID_MAPPING;
MAPPING_ID TARGET_NAME	 OLD_DBID   NEW_DBID
---------- ----------- ---------- ----------
	 1 cawr        1585153544 1585153544
	 4 DGA	       1852644603 1852644603
	 5 DGB	       1852644603 1852644603

To double check everything is OK after this change, I create some new snapshots, and perform “Upload Snapshots Now” again on DGB.  I then switch back to DGA, create some more snapshots, and again perform “Upload Snapshots Now”.
Both tasks complete successfully, and I can check everything is as hoped from the load metadata table:

SQL> select target_name, dbid, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARGET_NAME	  DBID BEGIN_SNAP_ID END_SNAP_ID
----------- ---------- ------------- -----------
DGA	    1852644603		   1	      10
DGB	    1852644603		  11	      13
DGB	    1852644603		  14	      15
DGA	    1852644603		  16	      18

One thing to note with this ‘hack’ is that is best to perform “Upload Snapshots Now” on the active database before a switch-over is performed to make sure there are no extracts ‘left behind’ when the database become standby. Obviously this may not be possible in fail-over situations. I will try to cover work-arounds for this in a follow-up blog post.

AWR Warehouse and Data Guard (Part One)

Note, if you are not interested in the nitty gritty of my testing and results, you may just want to skip ahead to the blog entry i have made that summarizes my findings, and makes some recommendations.

One limitation of AWR Warehouse (AWRW) is that it doesn’t play nicely with Data Guard.  I have been spending some time recently to reproduce issues that we have seen on our production systems. With luck some of the workarounds we have implemented to minimize there problems will be discussed in a follow up post.

First here are some details about my testing environment; note in particular that it does not follow the best practice of separating the OEM and AWR Warehouse repositories:

  • Host dga holds database instance DGA
  • Host dgb holds database instance DGB
  • DGA and DGB are configured as primary and standby via Oracle Data Guard
  • Host oem hosts Enterprise Manager (OEM) and database instance OMR
  • OMR holds both OEM and AWRW repositories
  • All hosts are running Oracle Linux 7.3
  • All databases are Oracle 12.2.0.1
  • All hosts and database instances are configured as targets in OEM
  • Preferred credentials have been defined for all databases.

Let’s add DGA as an AWRW target in AWRW, from AWR homepage in OEM we click the Add button and select DGA.  After a few seconds we can see DGA in our list of targets:

DGA_Uploading

I then highlight this target and select “Upload Snapshots Now” from Actions drop-down list.

Success. We can see more information in CAW_MAPPINGS table in the AWRW repository:

SQL> SELECT mapping_id, target_name, old_dbid, new_dbid FROM DBSNMP.CAW_DBID_MAPPING;

MAPPING_ID TARG   OLD_DBID   NEW_DBID
---------- ---- ---------- ----------
	 1 cawr 1585153544 1585153544
	 2 DGA	1852644603 1852644603

SQL>

Let’s try to add DGB (currently running as standby) as a target:

Screen Shot 2017-03-26 at 5.08.32 PM

As you can see, adding the standby instance fails.

Next switchover is performed such that DGB is primary, and DGA is standby.

One thing to note is that AWR tables now exist in DGB (they were created on DGA when it was primary and have been copied as part of normal standby redo apply).

Addition of DGB to AWRW now completes successfully:

Screen Shot 2017-03-26 at 5.12.35 PM

“Upload Snapshots Now” also runs without problems successfully. However if looking closer shows some problems.  The CAW_MAPPINGS table reveals that during the addition, OEM detected it already had an instance in AWRW with the same DBID, so it re-mapped DGB to a new DBID, 1.

SQL> select mapping_id, target_name, old_dbid, new_dbid from dbsnmp.caw_dbid_mapping order by mapping_id;

MAPPING_ID TARG   OLD_DBID   NEW_DBID
---------- ---- ---------- ----------
	 1 cawr 1585153544 1585153544
	 2 DGA	1852644603 1852644603
	 3 DGB	1852644603	    1
SQL> 

When performing analysis of AWRW data, this mapping would need to be considered.

Analysis of table CAW_LOAD_METADATA reveals further problems:

SQL> select target_name, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata;

TARG BEGIN_SNAP_ID END_SNAP_ID
---- ------------- -----------
DGA		 1	     6
DGB		 1	     7

AWR history for snapshot ids 1 through 6 have been uploaded twice, once for DGA, once for DGB. In a real environment where the volume of AWR data on the source database is large, this could be an nasty surprise. Also any analysis of AWRW data will likely be invalid with some snapshots counted twice under different DBIDs.

Switch-over such that DGA is returned to primary role is then performed, and “Upload Snapshots Now” run.

The task completes successfully, and this time, checking CAW_LOAD_METADATA we can see that we don’t have any additional duplicates.

SQL> select target_name, begin_snap_id, end_snap_id from dbsnmp.caw_load_metadata order by dump_id;

TARG BEGIN_SNAP_ID END_SNAP_ID
---- ------------- -----------
DGA		 1	     6
DGB		 1	     7
DGA		 8	    10

SQL>

It seems the main problem was caused during the initial configuration of the target once this has occurred, that in general, uploads take place with no problems after switch-over, with the caveat that uploads from DGB will be mapped to the new DBID 1.

Note in particular that the mapping is determined by the instance that performed the upload which may or may not be the instance were the snapshot was made.

There are exceptions however; in particular, note that during normal operations the Extract, Transfer and Load operations are independent tasks, so that if the extract completes successfully on one instance and then switch-over occurs before the transfer runs, the transfer tasks (on both primary and standby instances) will fail to upload the relevant snapshots.

More details about this case, and as promised, some of the workarounds we have implemented to follow.

Transfer Tasks failing with ORA-28040 after Upgrading AWR Warehouse to 12cR2

Checking the AWR warehouse health after we upgraded the repository database from 12cR1 to 12cR2 I noticed that for some source databases the transfer task had been failing.
Performing “Upload Snapshots Now” I was also able to reproduce the problem and identify the error message “ORA-28040: No matching authentication protocol”.

This was initially puzzling as the source database was version 11gR2, and in 12cR2 the default value of the parameter SQLNET.ALLOWED_LOGON_VERSION_SERVER is 12, which should allow connections from 11gR2 databases.

A bit of troubleshooting identified that the transfer task uses the Instant Client bundled with the OEM agent for this task.  The agent version was 12cR4, (we can’t use a later version of the agent as this is running on RHEL5) and the Instant Client bundled with it was 11GR1.

I updated SQLNET.ALLOWED_LOGON_VERSION_SERVER in sqlnet.ora of the AWRW repository database, and the AWR transfer task completes without issue.