Forum Outage August 17

Keep in touch with updates to the forum
Post Reply
User avatar
Admin1
Site Admin
Posts: 156
Joined: Fri Dec 30, 2016 12:55 pm
Location: Somewhere in your computer....
Has thanked: 2 times
Been thanked: 14 times
Contact:

Forum Outage August 17

Post by Admin1 »

For anyone interested in the official long version from our website hosting provider about yesterday's outage, here it is:

***

Yesterday beginning at approximately 9:20 am Eastern, one of our Kansas City servers began having an impact for an unknown cause. Services were inaccessible to multiple control panels and we began resolution impact by trying to identify the issue. We reached out to the Datacenter Team in Kansas City to access the server directly and check its health. During this time, we were able to reboot the server and verify that one of the hard drives were no longer available to the server causing the reboot to require a rebuild of its configuration which is an automated process after we confirm it can proceed. This, however, until is triggered, will lock the server from booting. Once we were able to identify this problem through the datacenter team, we regained access to the servers control panel and verified with our upstream providers that, one of our primary hard drives, had failed. This is a very unexpected and the hard drive should have reached its peak reliability. It was showing no signs of wear or potential failure during its regular daily system checks, there, unfortunately, was no way we could predict this.

As a company, although not required by our SLA and Terms of Use, back up data on at least a daily basis, and often twice or up to 24 times a day, depending on the client. However, we only backup our parent and sub-servers on a weekly to monthly basis since this is mostly configuration based and it's not often that server configurations are changed. Fortunately, there was little changed in server configuration and we were quickly able to restore an old snapshot of our server that handles Web-Clients, but this also required we manually restored clients data from the backups that occured earlier that day. This may have resulted in a couple of hours of rollback, but the alternative was starting fresh and this simply is not an option.

Since this impact we have fully recovered, the entire restoration including spinning up the replacement server on a new hard drive and restoring client data took about 2 of the 6-hour impact. The impact was officially resolved at 3:13 pm Eastern. While we understand downtime at any time is undesirable, I can assure you that if this was preventable, our team would have done anything possible. We worked swiftly and promptly to restore clients services, beyond the scope of our requirements which included restoring client data which is not a requirement of our SLA or Terms of Use.

In the coming weeks, we will initiate a maintenance window of about 15 minutes to replace the failed hard drive. We will be sure to schedule this in advance and base the timing on traffic volumes to have the least impact possible. We have verified that our remaining drives are well within the capacity to handle approximately 4.5 more years of service, at minimum. We will be sure to continuously monitor our hardware, as usual, to ensure incidents like this are prevented if possible.
User avatar
alenigma
Colossal Race Fan
Colossal Race Fan
Posts: 8997
Joined: Sat Dec 31, 2016 11:21 am
Location: W. Columbia, SC.
Has thanked: 689 times
Been thanked: 120 times
Contact:

Re: Forum Outage August 17

Post by alenigma »

Just don't drop the ball on race weekends.......got it?
Let the Next Generation Emerge! :s_bomb
User avatar
awsum14
Enormous Race Fan
Enormous Race Fan
Posts: 4203
Joined: Fri Dec 30, 2016 6:14 pm
Location: Prince George, VA
Has thanked: 71 times
Been thanked: 48 times

Re: Forum Outage August 17

Post by awsum14 »

Admin1 wrote: Sun Aug 18, 2019 3:38 pm For anyone interested in the official long version from our website hosting provider about yesterday's outage, here it is:

***

Yesterday beginning at approximately 9:20 am Eastern, one of our Kansas City servers began having an impact for an unknown cause. Services were inaccessible to multiple control panels and we began resolution impact by trying to identify the issue. We reached out to the Datacenter Team in Kansas City to access the server directly and check its health. During this time, we were able to reboot the server and verify that one of the hard drives were no longer available to the server causing the reboot to require a rebuild of its configuration which is an automated process after we confirm it can proceed. This, however, until is triggered, will lock the server from booting. Once we were able to identify this problem through the datacenter team, we regained access to the servers control panel and verified with our upstream providers that, one of our primary hard drives, had failed. This is a very unexpected and the hard drive should have reached its peak reliability. It was showing no signs of wear or potential failure during its regular daily system checks, there, unfortunately, was no way we could predict this.

As a company, although not required by our SLA and Terms of Use, back up data on at least a daily basis, and often twice or up to 24 times a day, depending on the client. However, we only backup our parent and sub-servers on a weekly to monthly basis since this is mostly configuration based and it's not often that server configurations are changed. Fortunately, there was little changed in server configuration and we were quickly able to restore an old snapshot of our server that handles Web-Clients, but this also required we manually restored clients data from the backups that occured earlier that day. This may have resulted in a couple of hours of rollback, but the alternative was starting fresh and this simply is not an option.

Since this impact we have fully recovered, the entire restoration including spinning up the replacement server on a new hard drive and restoring client data took about 2 of the 6-hour impact. The impact was officially resolved at 3:13 pm Eastern. While we understand downtime at any time is undesirable, I can assure you that if this was preventable, our team would have done anything possible. We worked swiftly and promptly to restore clients services, beyond the scope of our requirements which included restoring client data which is not a requirement of our SLA or Terms of Use.

In the coming weeks, we will initiate a maintenance window of about 15 minutes to replace the failed hard drive. We will be sure to schedule this in advance and base the timing on traffic volumes to have the least impact possible. We have verified that our remaining drives are well within the capacity to handle approximately 4.5 more years of service, at minimum. We will be sure to continuously monitor our hardware, as usual, to ensure incidents like this are prevented if possible.
:s_mad :s_mad What are you saying? "I" broke you precious hard drive when I posted at 5:00 in the morning. :boxing: :boxing:
:s_omg :s_omg "I HAVE THE POWER" :laser:
I need to put my worm back in it's can and keep it for later :s_rofl :s_rofl
User avatar
HiddenHollow
Enormous Race Fan
Enormous Race Fan
Posts: 7311
Joined: Fri Dec 30, 2016 1:04 pm
Location: Tampa, FL
Has thanked: 244 times
Been thanked: 265 times
Contact:

Re: Forum Outage August 17

Post by HiddenHollow »

alenigma wrote: Sun Aug 18, 2019 7:19 pm Just don't drop the ball on race weekends.......got it?
The timing is what got me so upset. Why couldn't it have been a Tuesday or Wednesday. Of course I realize that you can't plan/schedule a hardware failure, but still.... :s_mad
Image Always A Racer - Forever A Champion Image
User avatar
alenigma
Colossal Race Fan
Colossal Race Fan
Posts: 8997
Joined: Sat Dec 31, 2016 11:21 am
Location: W. Columbia, SC.
Has thanked: 689 times
Been thanked: 120 times
Contact:

Re: Forum Outage August 17

Post by alenigma »

Right, an impending software failure usually send alerts and whatnot. Hardware problems can come from brownouts, surges, or lightning strikes. Sometimes things simply get old and wear out...we older fellas know about these things.
Let the Next Generation Emerge! :s_bomb
User avatar
alenigma
Colossal Race Fan
Colossal Race Fan
Posts: 8997
Joined: Sat Dec 31, 2016 11:21 am
Location: W. Columbia, SC.
Has thanked: 689 times
Been thanked: 120 times
Contact:

Re: Forum Outage August 17

Post by alenigma »

Anybody...How do I clear out my PM inbox? I have a new message and I can't receive it. Please give me some 411.
Let the Next Generation Emerge! :s_bomb
User avatar
HiddenHollow
Enormous Race Fan
Enormous Race Fan
Posts: 7311
Joined: Fri Dec 30, 2016 1:04 pm
Location: Tampa, FL
Has thanked: 244 times
Been thanked: 265 times
Contact:

Re: Forum Outage August 17

Post by HiddenHollow »

alenigma wrote: Sat Aug 24, 2019 12:43 pm Anybody...How do I clear out my PM inbox? I have a new message and I can't receive it. Please give me some 411.
Sure, easy. First go to the inbox and look at the upper right to see how many pages of messages you have. Page 1 will be the newer messages, page 2 the next older older messages, etc. I usually go to the last page page to delete the oldest messages first. In the screen shot below, I only have 1 page of messages.

Page.jpg
Page.jpg (11.28 KiB) Viewed 13516 times

Next, scroll to the bottom of the list and look at the bottom right. There is a check box to the right of each message. Click in the check box for each message you want to delete. Next, click on the drop down arrow and select "Delete marked" then click on the "Go" button. You will get a confirmation screen asking if you really want to delete the messages. Click the "Yes" button.

Delete.jpg
Delete.jpg (14.51 KiB) Viewed 13516 times
Image Always A Racer - Forever A Champion Image
User avatar
alenigma
Colossal Race Fan
Colossal Race Fan
Posts: 8997
Joined: Sat Dec 31, 2016 11:21 am
Location: W. Columbia, SC.
Has thanked: 689 times
Been thanked: 120 times
Contact:

Re: Forum Outage August 17

Post by alenigma »

Thank You. You are my island in a storm, HH.
Let the Next Generation Emerge! :s_bomb
Post Reply