Page Properties |
---|
Atividade | OPERATIONS |
---|
Departamento | Status |
---|
colour | Purple |
---|
title | eNGINEERING |
---|
|
Status |
---|
colour | Red |
---|
title | cOSTUMER cUSTOMER sUPPORT |
---|
|
|
---|
Processo | AWS Region outage |
---|
|
...
📝 Descrição
🎯 Objetivo
To respond to an AWS region outage impacting Clinical Brain's infrastructure, ensuring rapid service restoration, minimal operational disruption, and clear communication with stakeholders. The goal is to maintain business continuity during such incidents.
...
This procedure is required due to the need to keep Clinical Brain's services running smoothly and without interruption. It's essential for making sure that our systems can quickly recover from an AWS region outage, helping to avoid long downtimes and keep our operations running efficiently. This aligns with our business goal of maintaining a reliable and consistent service for our users.
🧭 Definições
N/A
...
🗒️ Lista de atividades
Incident identification
Stakeholder communication
Automated recovery process initiation
DNS entry update for Disaster Recovery
Service restoration verification
Post-incident review
...
Descrição das atividades
Atividade #1 - Incident identification
...
Descrição | Initiate automated processes for disaster recovery |
---|
Recursos | |
---|
Responsável | Status |
---|
colour | Purple |
---|
title | eNGINEERING |
---|
|
|
---|
Substituição | |
---|
Passo a passo | Go to Clinical Brain tags Look through the list of tags to find the one with the highest value. This tag represents the version of the infrastructure currently running in production Go to Clinical Brain branches Launch the Create a branch wizard by clicking on the button New branch In the Name field, enter disaster-recovery/<major.minor.patch> . Replace <major.minor.patch> with the version numbers of the highest tag you identified earlier. For example, if the highest tag was 1.0.0 , your branch name should be disaster-recovery/1.0.0 In the Based on field, select the "tags" tab and choose the same tag you identified earlier as having the highest value. This step ensures that your new branch is based on the current production version Click on the Create button. This action will not only create the new disaster-recovery branch but also initiate a pipeline that automatically deploys the infrastructure to the disaster recovery region
After initiating the deployment, go to Clinical Brain pipeline to monitor the progress Keep an eye on the pipeline, as the following error is expected to occur: This is due to a credentials mismatch. When the RDS is restored from a production snapshot into the disaster recovery region, it retains the roles from the original database. Consequently, the database still references those roles credentials from the production account, while new credentials are generated and stored in the disaster region's AWS parameter store. Furthermore, these outdated roles, impede the proper authentication of lambdas interacting with the database.
To fix the error, navigate to Amazon Web Services (AWS) Log in to the medicineone_clinicalbrain-prod account, utilizing the Disaster_Recovery_Permissions role Select the Paris region from the region selection menu Access the Parameter Store service Locate and open the parameter /databases_connection_strings/clinical_brain/clinical_brain_user Click on Show decrypted value to reveal its content Note down the Server value, crucial for connecting to the disaster recovery database Note down the Password value. You'll need this for updating the database credentials in an SQL script, the details of which will be provided in the subsequent steps Return to the Parameter store service Open the parameter /databases_connection_strings/clinical_brain/lambda_user Click on Show decrypted value and record the Password. This, too, will be required for the SQL script mentioned later Again, in the Parameter Store, find and open /databases_connection_strings/master_user parameter Click on Show decrypted value and note the displayed Credentials, needed for authenticating against the disaster recovery database. Launch the pgAdmin software Right click on Servers and navigate to Register → Server In the General tab, enter clinical-brain-dr in the Name field Switch to the Connection tab In the Host name/address field, input the Server value you noted earlier Use the Credentials from the Parameter Store for the Username and Password fields Click on Save
Expand the clinical-brain-dr server Right click on clinical_brain database and select Query Tool Paste the following script: Code Block |
---|
ALTER ROLE clinical_brain WITH PASSWORD '<replace_by_clinical_brain_password>'; --replace with the password obtained from /databases_connection_strings/clinical_brain/clinical_brain_user
ALTER ROLE lambda WITH PASSWORD '<replace_by_lambda_password>'; --replace with the password obtained from /databases_connection_strings/clinical_brain/lambda_user/databases_connection_strings/clinical_brain/clinical_brain_user |
Replace <replace_by_clinical_brain_password> with the password obtained earlier for /databases_connection_strings/clinical_brain/clinical_brain_user Replace <replace_by_lambda_password> with the password obtained earlier for /databases_connection_strings/clinical_brain/lambda_user
Now that the database credentials are updated, navigate to Clinical Brain pipeline Click on the button Run pipeline Select the previously created disaster recovery branch in the Branch/tag field and click on Run Monitor the pipeline and wait for it to complete successfully Access Amazon Web Services (AWS) again Navigate to the API Gateway service In the left menu, select Custom domain names Find and select the custom domain name clinicalbrain.medicineone.cloud In the Configurations tab, locate the API Gateway domain name and take note of its value. This information will be provided to for updating the DNS entry, which is detailed in the communication plan
|
---|
Stakeholders | Status |
---|
colour | Red |
---|
title | cOSTUMER sUPPORT |
---|
| Customers |
---|
...