Operational procedures | AWS Region outage
Atividade | OPERATIONS |
---|---|
Departamento | sALES&MARKETING eNGINEERING cUSTOMER sUPPORT SAST |
Processo | AWS Region outage |
Descrição
Objetivo
To respond to an AWS region outage impacting Clinical Brain's infrastructure, ensuring rapid service restoration, minimal operational disruption, and clear communication with stakeholders. The goal is to maintain business continuity during such incidents.
Âmbito
This procedure is required due to the need to keep Clinical Brain's services running smoothly and without interruption. It's essential for making sure that our systems can quickly recover from an AWS region outage, helping to avoid long downtimes and keep our operations running efficiently. This aligns with our business goal of maintaining a reliable and consistent service for our users.
Definições
N/A
Lista de atividades
Incident identification
Stakeholder communication
Automated recovery process initiation
DNS entry update for Disaster Recovery
Service restoration verification
Post-incident review
Descrição das atividades
Atividade #1 - Incident identification
Descrição | Identify and confirm an AWS region outage affecting Clinical Brain's services |
---|---|
Recursos | |
Responsável | eNGINEERING |
Substituição | todo |
Passo a passo |
|
Stakeholders | sALES&MARKETING cUSTOMER sUPPORT SAST Customers |
Plano de Comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
Confirmation of outage | Once
| eNGINEERING
| sALES&MARKETING cUSTOMER sUPPORT SAST | |
Customers | Confluence (https://medicineone.atlassian.net/wiki/spaces/CUSTOMERSUPPORT/pages/377585669) |
Atividade #2 - Stakeholder communication
Descrição | Communicate with internal and external stakeholders about the incident and ongoing response actions |
---|---|
Recursos |
|
Responsável | sALES&MARKETING |
Substituição | todo |
Passo a passo | todo - migrate https://medicineone.atlassian.net/wiki/spaces/CUSTOMERSUPPORT/pages/376635393 steps to here |
Stakeholders | Customers |
Plano de Comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
todo | todo | todo | todo | todo |
Atividade #3 - Automated recovery process initiation
Descrição | Initiate automated processes for disaster recovery |
---|---|
Recursos |
|
Responsável | eNGINEERING |
Substituição | todo |
Passo a passo |
|
Stakeholders | sALES&MARKETING cUSTOMER sUPPORT SAST Customers |
Plano de comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
---|---|---|---|---|
Recovery process completed, indicating the value of the API Gateway domain name | Once | eNGINEERING | sALES&MARKETING cUSTOMER sUPPORT SAST | |
Provide regular updates | Regular intervals or as new information becomes available | eNGINEERING | Customers | Confluence (https://medicineone.atlassian.net/wiki/spaces/CUSTOMERSUPPORT/pages/377585669) |
Atividade #4 - DNS entry update for Disaster Recovery
Descrição | Update the DNS entry to redirect traffic to the new API Gateway in the disaster recovery region. |
---|---|
Recursos | todo |
Responsável | SAST |
Substituição | todo |
Passo a passo |
|
Stakeholders | sALES&MARKETING cUSTOMER sUPPORT SAST Customer |
Plano de comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
---|---|---|---|---|
DNS update completed | Once | SAST | eNGINEERING sALES&MARKETING cUSTOMER sUPPORT | |
Customers | Confluence (https://medicineone.atlassian.net/wiki/spaces/CUSTOMERSUPPORT/pages/377585669) |
Atividade #5 - Service restoration verification
Descrição | Verify the restoration of services post-recovery |
---|---|
Recursos | |
Responsável | eNGINEERING |
Substituição | todo |
Passo a passo |
|
Stakeholders | sALES&MARKETING cUSTOMER sUPPORT SAST Customer |
Plano de comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
---|---|---|---|---|
Service verification results | Once | eNGINEERING | eNGINEERING sALES&MARKETING cUSTOMER sUPPORT | |
Customers | Confluence (https://medicineone.atlassian.net/wiki/spaces/CUSTOMERSUPPORT/pages/377585669) |
Atividade #6 - Post-incident review
Descrição | Conduct a review of the incident response and document lessons learned |
Recursos | |
Responsável | eNGINEERING |
Substituição | todo |
Passo a passo |
|
Stakeholders | sALES&MARKETING cUSTOMER sUPPORT SAST |
Plano de comunicação
Informação | Periodicidade | Emissor | Destinatário | Meio |
todo | todo | todo | todo | todo |