Achieving Delivery Confidence with Synthetic Monitoring

Published Apr 18, 2024

Introduction
Application reliability is a difficult metric to validate. It tends to behave in unexpected ways under varying circumstances. It is not uncommon to see this kind of behavior in most applications. During the initial stages of development, teams test the applications with limited scenarios, and these tests do not guarantee the application's reliability and consistency in live environments where real-time user traffic fluctuates.

However, there is one way to achieve delivery and reliability confidence, which is by subjecting the application to various use-case and functionality-specific scenarios. These scenarios with extreme and unexpected conditions. Synesthetic monitoring is a practice enabled by principles to synthesize extreme and unexpected conditions and expose them to applications. Let’s understand this better in the post below.

The What and Why of Synthetic Monitoring

Synthetic monitoring helps validate different capabilities of an application and apply simulated scenarios to test its behavior. It enables functionality testing and validation with dummy but synthesized and automated inputs. As a result, applications can be tested and validated for performance, security, and reliability. Now that we know what is synthetic monitoring, and where it is used, let us understand how to implement it.

Simulating to Test Application Behavior

Every user-facing application, irrespective of its nature, must deliver a few capabilities. These capabilities include performance, security, consistency, and scalability. Confidence in application delivery is set based on how well the application can handle these aspects. Let us simulate scenarios with dummy interaction and understand how these simulated scenarios help validate the web application's reliability.

Simulating Performance Checks

Low-latency applications delivering solutions are highly valued. In the development phase, the traffic entering the application is limited. This keeps the teams under the impression that the application is functioning as expected. In reality, promising applications receive millions of user interactions.
The increase in user traffic directly impacts the performance. High traffic results in low performance and application downtimes. Keeping this in mind, teams must simulate dummy user interactions. Through these dummy simulations, dev teams can implement stress and performance tests.

Here is the dummy skeleton built using the Selenium client to simulate user traffic.

from selenium import webdriver
from threading import Thread
import time
import logging
def dummy_user_interactions(url: str):
    driver = webdriver.Chrome()
    try:
        requested_at = time.time()
        driver.get(url)
        response_receieved_at = time.time()
        page_load_time = response_receieved_at - requested_at
        if page_load_time < 15:
            logging.info("Load time withing limit(15 milli seconds).- Performance OK")
        else:
            logging.info("Load time exceeded. Performance might impact")
    finally:
        driver.quit()

The dummy function will attempt to query the given URL. It will capture the time when the request was sent and received. Based on the difference, and if the difference is within the permissible limit (15ms), the function logs the performance state of the application.

dummy_users_count= 10000
threads = []
for in range(dummy_users_count):
    thread = Thread(target=dummy_user_interactions, args=("https://www.checklyhq.com"))
    threads.append(thread)
    thread.start()

With the dummy function in place, we can iterate over the range of dummy user variables. For this dummy test, let's assume we want to simulate 10,000 user interactions. We will generate system threads and call the dummy function on every thread to run in parallel. Increasing the count of dummy users should log how the application is responding to multiple requests.

Simulating Scalability Checks

Cloud-hosted applications route incoming requests to multiple services. These services run on cloud computing with pre-defined memory and cores. The default cores and memory are limited to applying cost optimization. When traffic increases, a set of rules will trigger the streamlined jobs to increase the core and storage as per requirement.

Let us assume an application is receiving many POST requests with different payloads. Every request will submit a code snippet onto a cluster, perform the operation, and return the response. The team needs to ensure that when computing and memory are overloaded, more instances should be provisioned.

Here is the dummy code to test our scalability check.

import requests
import threading
urls = ["https://www.checklyhq.com/dummy_test", "https://www.checklyhq.com/synthetic_monitoring"]
payloads = [{"dummy": "sample_payload"}, {"dummy1": "sample_payload1"}]
def send_post_request(url, payload):
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        print(f"POST request to {url} with payload {payload} successful.")
    else:
        print(f"POST request to {url} with payload {payload} failed.")
for (url, payload) in zip(urls, payloads):
    thread = threading.Thread(target=send_post_request, args=(url, payload))
    thread.start()

The code will iterate over a list of URLs and payloads and submit POST requests to the URL with the payload. This is straightforward boilerplate code. Now, we need to check when incoming requests vary and whether our cloud platform will provision or terminate resources.

import boto3
import time
ec2 = boto3.resource('ec2')
filters = [{'Name': 'checklyhq', 'Values': ['checklyhqCompute']}]
while True:
    instances = list(ec2.instances.filter(Filters=filters))
    instances_with_increased_resources = []
    for instance in instances:
        instance_cores = instance.cpu_options['CoreCount']
        instance_volume_size = instance.block_device_mappings[0].ebs.volume_size
        if instance_cores > 2 and instance_volume_size > 100:
            instances_with_increased_resources.append(instance.id)
    if instances_with_increased_resources:
        logging.info("New compute instances with more CPU cores and EBS volume were provisioned:{}".format(instances_with_increased_resources))
    else:
        logging.info("No new compute instances with increased CPU cores and EBS volume were provisioned.")
    time.sleep(600)

The sample script will query cloud resources every ten minutes and validate the resource status. This can be streamlined using load balancers and event triggers.

Simulating Security Checks

Security is a major focus area for every application. Attackers exploit applications to find a backdoor or a loophole in the system. Applications need to be built with security-first approaches and principles. Every input and every interaction has to undergo standard checks to validate against policies. A small mistake can cause hefty fines and reputation losses.

Let us assume a user is attempting to trick the application through a submission form. They attempt to perform an SQL injection using the form. The application should handle the input and accept it only if it is in a secure format.

username_input_elem = driver.find_element_by_id("username")
username_input_elem.send_keys("' OR 1=1 --")
password_input_elem = driver.find_element_by_id("password")
password_input_elem.send_keys("12345678")
submit_button = driver.find_element_by_id("login_button")
submit_button.click()

If an organization wants to secure its applications from security risks, it needs to embed its applications with every imaginable security solution and remediation.

alert = driver.switch_to.alert
alert_text = alert.text
logging.info(f"Potential SQL injection attempt detected: {alert_text}")
send_email_to_security_team(alert_text)
alert.accept()

When an insecure pattern is observed, the applications should log and alert the security teams. Remediation measures should be triggered with sub-second latencies to avoid insecure interaction.

Conclusion
Synthetic monitoring assures that the applications are reliable for security, performance, and scalability by identifying bottlenecks or performance problems and applying optimizations or enhancements. It also evaluates utilization metrics to improve the scalability and platform handling capabilities. Identifying anomalies to apply remediations and trigger security alerts can all be tested and validated using synthetic monitoring.

Mobile app development Scalability

Report

Enjoy this post? Give Kruti Chapaneri a like if it's helpful.

Kruti Chapaneri

Aspiring Software Engineer and Tech Marketer

I am an aspiring software engineer and tech marketer with a strong interest in the intersection of technology and business. I am excited to use my skills to help businesses grow and succeed.v

Discover and read more posts from Kruti Chapaneri

get started