Thursday 26 October 2023

Overview of the Pingdom service


Pingdom is a Swedish company which provides a website monitoring service of the same name. It tests website accessibility/availability by pinging it from multiple locations around the globe. 

How does Pingdom work


Pingdom has a global network of so called probe servers (bots). The full list of their IP addresses can be found here: Pingdom URLs and IP addresses. It is important that these IP addresses are whitelisted in firewalls or Access Control Lists (if they are used). 


When one of our probe servers cannot connect to a site or server, Pingdom's system will first mark the check as Unconfirmed Down and then ask another probe server to try to make the same connection, we call this a Second Opinion, we try to make the second opinion as geographically different as possible to make it easier to determine where the issue is. Your check (site or server) will only be marked as confirmed Down if the second test fails. It will continue to be marked as Down as long as consecutive probe requests register errors.


When does Pingdom flags that webiste is down (that it has an outage)?



Some common outage reasons and their most common causes:

Timeouts (30 seconds to connect to site, load HTML), mostly caused by either our servers being blocked or the site being really really slow.

HTTP Error 403, our servers are being forbidden from visiting the site. Again this is most likely caused by a block against our servers, or the site showing an error page that forbids people to view it.

HTTP Error 500/502/503, something is wrong with this server. Or the server is showing Pingdom servers an error page.


Overview of Pingdom products


On its website, Pingdom offers four products:

  • Visitor Insights (Real User Monitoring - RUM) - to better understand website visitors
  • Uptime - monitoring and alerting when website is down
  • Page Speed - for measuring page performance
  • Transactions - for simulating user interactions

Uptime


To set up the new check, we ned to click on Add new button:



And fill the Add Uptime Check form:




Check Settings (general):
  • Name of check - arbitrary name of the check
  • Check interval - number (minutes); Determines how often this check will be tested. 1 minute is recommended. Other options are 5, 15, 30 and 60 minutes. 
  • There are 3 tabs which represent 3 check targets and we need pick one:
    • Web
      • Check type:
        • HTTP(S) - Monitor a web page
        • HTTP Custom - Monitor scripts on your web page
      • Required:
        • URL/IP: http:// or  https:// + hostname and path
      • Optional
        • Port: You only need to specify this when not using the standard port (80 for HTTP, 443 for HTTPS).
        • User name: The username required to access the page, if any (HTTP Authentication). Case sensitive.
        • Password: The password for the username above, if any. Case sensitive.
        • Check for string: A text string that has to be (or must not be) present in the HTML code of the page. If this text is missing from the page, the site will be considered as down. Leave this field empty if you do not want to require a string to be present. This text is NOT case sensitive.
          • Should contain <string>
          • Should not contain <string>
        • POST data: Data that should be posted to the web page, for example submission data for a sign-up or login form. The data needs to be formatted in the same way as a web browser would send it to the web server.
        • Request headers: Headers that should be sent with the HTTP request. You may set a custom header. It is not possible to remove the user agent header. e.g. User-Agent = Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
        • Monitor SSL/TLS certificate: Monitor the validity of your SSL/TLS certificate. With this enabled Uptime checks will be considered DOWN when the certificate becomes invalid or expires.
          • SSL/TLS certificate monitoring is available for HTTP checks
        • Consider down prior to certificate expiring: number of days.  Select the number of days prior to your certificate expiry date that you want to consider the check down. At this day your check will be considered down and if applicable a down alert will be sent.
        • Use IPv6: By default is set to IPv4, select only if you require IPv6.
    • Network
      • Check type:
        • TCP Port - Monitor a TCP port for a response
        • Ping - Monitor network connectivity
        • DNS - Monitor lookups on a DNS server
        • UDP - Monitor a UDP port for a response
      • Required:
        • Domain/IP: - Either the IP address (example: 111.11.11.11) or the name of the site (example: www.mysite.com).
        • Port: Select the TCP port you wish to check. If not one of the standard ones in the list, enter the TCP port number in the box.
      • Optional:
        • String to send: The string to send, required by the TCP protocol. Case sensitive.
        • String to expect: The string to expect back, required by the TCP protocol. Case sensitive.
    • Email:
      • Check type:
        • SMTP - Monitor an SMTP server for a response
        • POP3 - Monitor a POP3 server for a response
        • IMAP - Monitor an IMAP server for a response
      • Required:
        • Domain/IP: Either the IP address (example: 111.11.11.11) or the name of the site (example: www.mysite.com).
      • Optional:
        • Port: You only need to specify this when not using the standard port (25).
        • User name: The username required to access SMTP server, if any. Case sensitive.
        • Password: The password for the username above, if any. Case sensitive.
        • String to expect: The string to expect back, required by the SMTP protocol. Case sensitive. Default is "220".
        • Use STARTTLS: (bool) Encryption using SSL (Secure Sockets Layer)/TLS (Transport Layer Security).
  • Test from: Select from which region your check should be tested. By default we test alternatingly from Europe and North America.
    • Default (North America and Europe)
    • North America
    • Europe
    • Asia Pacific
    • Latin America
  • Tags:  Tags can be used to organize checks. You can add as many tags as you like. Return or space creates a new tag. For example, creating the tag Server 1 and adding it to all checks that run on Server 1 would allow you to search and view all checks for Server 1 in the Uptime list view.
Alerting Settings (part of the check settings):
  • Check importance: Allows users to control how they receive alerts for outages (by email, text message or push notification), based on the set importance level of the check. You can manage your alert settings in your User profile.
    • High importance
    • Low importance
  • Who to alert? Select which user, contact or team that should be alerted if the check is considered down.
    • All Users (team of 3)
    • Team A (team of 2)
    • Team B (team of 2)
    • Person 1
    • Person 2
    • ...
  • Consider down after: By lowering this value you set a threshold for your response time. We will alert you when the selected value has been exceeded. At which timeout threshold the site or server should be considered as unresponsive/down, meaning that our servers will wait this amount of time before reporting an outage
    • value can be in the range from 10ms to 30s
  • When down, alert after: Determines how soon an alert is sent once a check is considered down.
    • value can be instantly to 60 mins
  • Resend alert every:
    • value from the range never to 60 down cycles
  • Customized message: Any text here will be added to email and webhook alerts. Great for providing additional context in the case of an outage.
  • Alert when back up: boolean - Decide if you want to be alerted when a check is confirmed to be back up again.
  • Connect Integrations. Use webhooks to send alerts to third party systems.
    • You haven’t set up any integrations with your Pingdom account yet. By using integrations like webhooks you can do more with your favorite apps.


There are two types of web speed measurements:
  • Response Time
    • Used in Uptime reports
    • For an uptime check (http check) the response time is calculated as the time it takes to perform a HTTP GET to the specified URL, so the response time is calculated in three parts:
      • Time to first byte
      • Time to receive headers
      • Time to load HTML of the site
    • Skipping dynamic content is basically a cURL request. If you want a response time that is just TTFB (time to first byte) you can use a Ping check as this is almost equivalent.
    • Uptime check doesn't load any other elements on the page, and will thus only give you the Response Time of your website or server.
  • Page Load time (Load Time)
    • Used in the RUM and Page Speed reports
    • Used to describe how long a specific page took to load in its entirety, this includes all images, scripts, CSS and third party resources (as well as the HTML of course) that might be found on a website.
    • These reports will combine the load time of each element on the page to give you the total page load time, and this is why the load time of a website often is a lot higher than the response time.

How to recover the website which is experiencing an outage?



Resources:





No comments: