Sunday, March 4, 2018

Beware not to abort compact -REPLICA -RESTART

Due to some of our DBs encountered "Unable to extend an ID table" error, I have scheduled compact -REPLICA -RESTART to run on these DBs every weekend.

One fine weekend, our users decided to access these DBs but compact -REPLICA is still running on 1 large DB.

Thinking that compact -REPLICA replicates the DB to a new .REPL file, if I quit the compact task, Domino should be smart enough to stop the local replication and delete the new .REPL file, leaving everything untouched.

Boy I am so wrong and this caused me several angry users.

What happened?

compact -REPLICA -RESTART was running on a large DB, the documents are sync and it is syncing views to the .REPL file

I stopped compact task in Domino Administrator client, and suddenly the remaining views syncing process completed fast (but actually the remaining views are not sync at all)

[1BA8:0004-2388] Compact -REPLICA, syncing view: Unxxxxx, ID: 003FF26A, from DB: E:\Domino\data\xxx.nsf, to DB: E:\Domino\data\xxx.REPL 
[1BA8:0004-2388] Compact -REPLICA, syncing view: Utxxx, ID: 0000A7A2, from DB: E:\Domino\data\xxx.nsf, to DB: E:\Domino\data\xxx.REPL                   
[1BA8:0004-2388] Compact -REPLICA, syncing view: Wexxxx, ID: 0000A57A, from DB: E:\Domino\data\xxx.nsf, to DB: E:\Domino\data\xxx.REPL 
[1BA8:0004-2388] Compact -REPLICA, syncing view: Wixxx, ID: 0000A552, from DB: E:\Domino\data\xxx.nsf, to DB: E:\Domino\data\xxx.REPL                     
[1BA8:0004-2388] Compact -REPLICA, bring database online E:\Domino\data\xxx.REPL
[1BA8:0004-2388]         Database Name            Refs Mod  FDs  LockWaits/AvgWait #Waiters MaxWaiters Online
[1BA8:0004-2388] Compact -REPLICA, initial population complete for DB: E:\Domino\data\xxx.nsf
[1BA8:0004-2388] Compact -REPLICA, drop all users for E:\Domino\data\xxx.nsf                                         
[1BA8:0004-2388]         Database Name            Refs Mod  FDs  LockWaits/AvgWait #Waiters MaxWaiters Online
[1BA8:0004-2388] E:\Domino\data\xxx.nsf      2   Y   6        0      0        0       4         Y
[0DE4:0009-13D0] drop xxx.nsf
[1BA8:0004-2388] Compact -REPLICA, take database offline for E:\Domino\data\xxx.nsf
[1BA8:0004-2388] Compact -REPLICA, take database offline for E:\Domino\data\xxx.nsf succeeded - offline: 1, references: 2   
[1BA8:0004-2388] Compact -REPLICA, bring database online E:\Domino\data\xxx.nsf
[1BA8:0004-2388]         Database Name            Refs Mod  FDs LockWaits/AvgWait #Waiters MaxWaiters Online
[1BA8:0004-2388] Compact -REPLICA, needs to restart server to complete compaction for DB: E:\Domino\data\xxx.nsf
[1BA8:0004-2388] 20/01/2018 12:29:15 PM  Error compacting E:\Domino\data\xxx.nsf,  compactdb.ind -REPLICA -RESTART -# 4: Program shutdown in progress
[1BA8:0004-2388] 20/01/2018 12:29:15 PM  Database compactor deferring 'restart server' for  E:\Domino\data\xxx.nsf,   compactdb.ind -REPLICA -RESTART -# 4
[1BA8:0002-1A48] 20/01/2018 12:29:16 PM  Database compactor issued a 'restart server', compactdb.ind -REPLICA -RESTART -# 4
[1BA8:0002-1A48] 20/01/2018 12:29:16 PM  Database compactor process shutdown
[0DE4:0009-13D0] restart server

After server was restarted, Domino proceed to replace the DB with the .REPL file! And the problem is Domino does not continue syncing the rest of the views which it does not completed earlier!

Compact -REPLICA, check rename of NSF file with existing restart flag, DB: E:\Domino\data\xxx.nsf
[249C:0092-1C68] Compact -REPLICA, rename with ORIG file E:\Domino\data\xxx.ORIG, DB: E:\Domino\data\xxx.nsf, deleting ORIG, error: No error
[249C:0092-1C68] Compact -REPLICA, rename of NSF file with existing restart flag, DB: E:\Domino\data\xxx.nsf
[249C:0092-1C68] Compact -REPLICA, shift: E:\Domino\data\xxx.nsf => E:\Domino\data\xxx.ORIG, E:\Domino\data\xxx.REPL => E:\Domino\data\xxx.nsf
[249C:0092-1C68] Compact -REPLICA, rename complete for DB: E:\Domino\data\xxx.nsf
[249C:0092-1C68] Compact -REPLICA, resync of NSF file with existing restart flag, DB: E:\Domino\data\xxx.nsf
[249C:0092-1C68] Compact -REPLICA, take database offline E:\Domino\data\xxx.ORIG
[249C:0092-1C68] Clearing DBIID D581A1A5 for DB E:\Domino\data\xxx.ORIG
[249C:0092-1C68] Compact -REPLICA, no refresh file E:\Domino\data\xxx.nsf, from DB: E:\Domino\data\xxx.ORIG since:  20/01/2018 08:03:28 AM, data: 20/01/2018 07:01:19 AM, nondata: 20/01/2018 01:00:56 AM
[249C:0092-1C68] Compact -REPLICA, syncing unread from DB: E:\Domino\data\xxx.ORIG to DB: E:\Domino\data\xxx.nsf   
[249C:0092-1C68] Compact -REPLICA, NSFDbCompactSyncFolders: Replicating folders  Since time (20/01/2018 08:03:28 AM), Src time (), Dst time ()
[249C:0092-1C68] [249C:0092-1C68] Compact -REPLICA, NSFDbCompactSyncFolders: NSFStartFolderReplSource->0x3AE=Folders in database are up to date
[249C:0092-1C68] Compact -REPLICA, bring database online E:\Domino\data\xxx.ORIG
[249C:0092-1C68] 20/01/2018 12:30:12 PM  Recovery Manager: Assigning new DBIID for E:\Domino\data\xxx.nsf (need new backup for media recovery).
[249C:0092-1C68] 20/01/2018 12:30:12 PM  Compacting E:\Domino\data\xxx.nsf (), restart completing, -REPLICA -RESTART               [249C:0092-1C68]         Database Name            Refs Mod  FDs LockWaits/AvgWait  #Waiters MaxWaiters Online
[249C:0092-1C68] Compact -REPLICA, delete E:\Domino\data\xxx.ORIG                                       
[249C:0092-1C68] Compact -REPLICA, complete for DB: E:\Domino\data\xxx.nsf
[249C:0092-1C68] 20/01/2018 12:30:14 PM  Compacted E:\Domino\data\xxx.nsf (), restart completed, -REPLICA -RESTART
[249C:0092-1C68] 20/01/2018 12:30:14 PM  Compacted E:\Domino\data\xxx.nsf (), increased by 2659328K bytes (<1%), -REPLICA -RESTART

This results in very slow response while our users tried to open the unsync views which are not built yet.

After communicating with our IBM Support, they finally agreed to create SPR # MJCGAWE9MD: Compact -replica Does Not Have A Way To Check What Process It Left Off Before Aborted

There is no way to know when IBM decides to make Domino smarter to safely quit compact -REPLICA so I would strongly advise against stopping compact -REPLICA while it is still running.

No comments:

Post a Comment

Beware not to abort compact -REPLICA -RESTART

Due to some of our DBs encountered "Unable to extend an ID table" error, I have scheduled compact -REPLICA -RESTART to run on thes...