
When performing chaos experiments, it’s important to correlate the attack with the impact on your system. Webhooks send information about your attack to your monitoring tool of choice. In this way, you can observe from a single pane of glass the timeline of the attack and the reaction of your system. Exporting reports of the results also provides valuable information that can be shared when assessing the reliability of your applications.
In this tutorial, we’ll demonstrate two methods to send Gremlin events to New Relic. The first sets up a Gremlin webhook to send custom events to New Relic to demonstrate how you can highlight the effect of a Gremlin Attack on a system. The second uses New Relic Deployments to mark Gremlin attacks in New Relic’s APM dashboard.
Register a new API key for the Gremlin webhook inside of New Relic’s Insights tool. Go to “Manage data” > “API Keys.” Click the “+” symbol next to “Insert Keys.” Enter the description, such as “Gremlin Attack events” and click “Save your notes.” Note the endpoint URL at the end of the example cURL command https://insights-collector.newrelic.com/v1/accounts/{{account_id}}/events
and the API Key.
Log in to your Gremlin account. If you don’t have an account, request a demo.
Click the Settings button to the left of the red “Halt All Attacks” button. Select “Team Settings” then “Webhooks”. Select “New Webhook”. Fill in the New Relic endpoint URL and API key header from Step 1:
1https://insights-collector.newrelic.com/v1/accounts/{{account_id}}/events
1"X-Insert-Key”:”{{API_KEY}}”
New Relic has one required body tag for custom events called “eventType” that we need to include. Add this in the body of the Gremlin webhook:
1“eventType”:”{{value}}”
This example uses “eventType”:”GremlinAttack”
as the primary event to filter on inside of New Relic. Make any other customizations you would like to make to the key value pairs. New Relic is able to pick up the other fields as tags for filtering in NRQL queries and charts.
At this stage, all new attacks will send information in the form of custom events to New Relic. From here you can add custom charts and alerting to show when attacks occurred and the corresponding impact on your system, generate reports on how your infrastructure fared against attacks, or warn engineering leaders of the start of a FireDrill. Next, we’ll walk through a custom chart example.
Using NRQL, you can design a variety of charts. For this example, we’ll use charts to highlight the effect of an attack upstream. To add a Gremlin chart, go to your dashboard and click “Query your data”. Enter the below in the query field, then click “Run”.
1Select filter(count(*), where attackStatus = 'RUNNING') as ‘Attack Running’, filter(count(*), where attackStatus = 'FINISHED') as ‘Attack Finished’ FROM GremlinAttack SINCE 30 minutes ago TIMESERIES 1 minute
The last step is to run an attack and share your results. The CPU Attack is a great first step and is useful to verify tools such as monitoring, alerting, and autoscaling. We’ll run a CPU Attack and make sure that New Relic reports and informs us of the increased load.
In the Gremlin app, select “Create Attack” and select your hosts. Next, click “Choose a Gremlin”. Select “Resource” for the category and “CPU” for the attack. For this example, we’ll set the “Length” to 120
seconds, the “CPU Capacity” to 80
%, and the impact to All Cores
. Then click “Unleash Gremlin.”
You can see the start and end of the attack in the chart we created and the increased CPU load in the CPU Utilization chart. Our webhooks and New Relic monitoring tool are working.
At the end of the experiment, you can export the result from New Relic as a PDF to share your findings and to provide metrics for the reliability of your system.
Register a new API key for the Gremlin webhook inside of New Relic. Go to “Account Settings” > “API Keys”. Select the New Relic “Account”. Select “User” as Key Type. Give it a name like `gremlin_key` and click “Create key”. Copy the API Key for Step 3.
Next, we need to grab your Application ID. Navigate to the New Relic APM Overview dashboard. Find your application name by choosing the additional settings on the right for your application and select the “View Settings” or “See Metedata & Tags” (if in the new view for New Relic One). Copy the App ID from the Settings/Metadata Panel.
In this step, we’ll create the webhook to trigger a deployment event. In Gremlin, navigate to “Team Settings” > “Webhooks”. Click on “New Webhook”. Fill in the name New Relic APM Webhook
and the Request URL https://api.newrelic.com/v2/applications/{App ID}/deployments.json
. Add the custom header X-Api-Key
: {New Relic API Key}
. Check “Attack Running” and “Attack Finished”. Finally, add the following payload and click “Save”.
1{23"deployment": {45 "revision": "Gremlin ${ATTACK_ID}",67 "changelog": "See the attack details at https://app.gremlin.com/attacks/${ATTACK_ID}",89 "description": "Gremlin ${ATTACK_TYPE} attack ${STATUS}",1011 "user":"Gremlin team ${TEAM_ID}",1213 "timestamp": "${TIME}"1415}1617}
The last step is to run an attack and share your results. We’ll run a CPU Attack and monitor how transactions are impacted during increased load.
In the Gremlin app, select “Create Attack” and select your hosts. Next, click “Choose a Gremlin”. Select “Resource” for the category and “CPU” for the attack. For this example, we’ll set the “Length” to 120 seconds, the “CPU Capacity” to 80%, and the impact to All Cores. Then click “Unleash Gremlin.”
Head back to New Relic and open up the application with the webhook you created. You can see the start and end of the attack with the markers in the various charts. Our webhooks and New Relic APM are working.
Webhooks open up a plethora of ways to gain insight into the behavior of your systems. You can integrate with any system that takes JSON POST requests and customize the results to be most relevant for your organization. Recreate past incidents with Gremlin and hunt down the problem using your monitoring tool or integrate with your monitoring tool to ensure you're gathering the right metrics, have visibility into the health of your services, and are meeting your Service Level Objectives (SLOs).
You can learn more about Gremlin webhooks in the Docs. We’re excited to see what you build with this feature!
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started