Probably, i've been meaning to go back and read that the past few nights so I could give you a good answer to this question....
But like today, in our team's IRC channel, a teammate said, "hey, there are these 2 Oracle slaves, and one is 18 hours behind, the other is caught up, and they're usually the same. What's wrong?"
I offered that it might be a dead disk in the RAID pool -- though the hardware is different, if they're usually the same lag behind the master, their RAM/CPU/disk speed/etc are similar. So what would cause something to be catching up slowly? I know very little about Oracle, but I do know that they do block-level replication, sending the blocks changed to the slaves and the slaves then apply the changes.
It could also have been a RAM chip gone bad, but a disk is more likely. The "obvious" answer might have been "long running query on the one slave that's not on the other", etc. Now, if it were that, I'd look pretty silly checking out hardware configs, and there's a lot of getting involved with app (like database, email, whatever) level goose chases when it turns out to be hardware. Some of it's intuition, some of it's general "how things work", but in a nutshell....
yes. Teaching debugging skills.....or even teaching "how to think". how to question everything without losing all your friends. How to think objectively. you're good at that, and I wish more people were.
no subject
Date: 2008-01-09 02:56 pm (UTC)But like today, in our team's IRC channel, a teammate said, "hey, there are these 2 Oracle slaves, and one is 18 hours behind, the other is caught up, and they're usually the same. What's wrong?"
I offered that it might be a dead disk in the RAID pool -- though the hardware is different, if they're usually the same lag behind the master, their RAM/CPU/disk speed/etc are similar. So what would cause something to be catching up slowly? I know very little about Oracle, but I do know that they do block-level replication, sending the blocks changed to the slaves and the slaves then apply the changes.
It could also have been a RAM chip gone bad, but a disk is more likely. The "obvious" answer might have been "long running query on the one slave that's not on the other", etc. Now, if it were that, I'd look pretty silly checking out hardware configs, and there's a lot of getting involved with app (like database, email, whatever) level goose chases when it turns out to be hardware. Some of it's intuition, some of it's general "how things work", but in a nutshell....
yes. Teaching debugging skills.....or even teaching "how to think". how to question everything without losing all your friends. How to think objectively. you're good at that, and I wish more people were.