Sensor Network & Methodology

A high-level overview of how ipinsights.io collects, validates and scores threat intelligence — written for security professionals who want to assess data quality.

Overview

ipinsights.io derives its primary intelligence from a privately operated network of 73 honeypots, tarpits and deception devices. Every IP that interacts with a sensor is recorded, enriched with geolocation, ISP and Autonomous System data, and then cross-referenced against 22+ open-source threat intelligence feeds before a composite threat score is calculated.

The sections below describe each layer of this pipeline at a level of detail intended to help security professionals evaluate the reliability of the data without revealing operational specifics that could be used to evade detection.

By the numbers

Live platform-scale counters — the longer tail of figures relegated from the homepage so that the front door stays focused on the four counters most useful to a first-time visitor. All figures are refreshed on the same four-hour cadence as the rest of the dataset.

IPv6 addresses analysed
4,913

Distinct IPv6 addresses observed and enriched by the sensor network to date.

In-depth analyses
683,309

Addresses on which the platform holds a sensor-originated record — the subset on which the deepest enrichment, scoring and provenance is available.

Tor exit nodes tracked
9,377

Currently flagged Tor exit relays cross-referenced from upstream Tor directories.

I2P nodes tracked
1,349

I2P endpoints observed via the deception layer and routed through anonymity-aware enrichment.

Threat categories
21

Distinct blacklist categories currently in play across all active feeds — spam, malware, brute-force, botnet, scanner, Tor, and the longer tail behind them.

Countries observed
224

Distinct source countries currently represented on the dataset, by geolocation of the originating IP address.

Figures are read from the same pre-computed dashboard_stats table that powers the homepage counter strip, so they tick in lockstep with everything else on the site.

Sensor Types

The 73 sensors fall into three broad categories. Each category captures a different phase of attacker behaviour, giving the platform a more complete picture of malicious activity.

Honeypots

Low- and medium-interaction services that emulate real applications to attract and log exploitation attempts, credential stuffing and vulnerability scanning.

Tarpits

Services that deliberately slow down connections to waste attacker resources, while passively identifying automated scanners and brute-force tools.

Deception Devices

Decoy endpoints and fake services designed to detect lateral movement, reconnaissance and enumeration activity that other sensor types may miss.

Geographic Distribution

Sensors are deployed across multiple geographic regions and hosted on a variety of cloud and colocation providers. This diversity ensures that threat data is not biased toward a single country, network or provider.

Multi-Region Coverage

Sensors span Europe, North America and Asia-Pacific, capturing regionally-targeted campaigns as well as globally distributed scans.

Diverse ASN Footprint

Sensors sit on different Autonomous Systems to avoid detection by attackers who fingerprint hosting ranges and skip known honeypot networks.

Exact locations and IP ranges are not disclosed to prevent evasion by threat actors.

Emulated Protocols

Sensors emulate a range of commonly targeted protocols. This breadth allows the platform to observe attacks across the most frequently exploited services on the internet.

SSH & Telnet

Capture brute-force login attempts, credential stuffing and post-authentication commands.

HTTP / HTTPS

Detect web vulnerability scanning, path traversal, injection attempts and exploit-kit probes.

SMTP

Identify spam relays, open-relay probes and email-based attack infrastructure.

Database Services

Emulate MySQL, PostgreSQL, Redis and similar services to trap automated database scanners.

SMB / FTP

Observe file-sharing exploits, ransomware propagation and worm-like scanning behaviour.

IoT / ICS Protocols

Emulate Modbus, S7comm, and common IoT interfaces targeted by botnets and nation-state actors.

Blocklist Cross-Referencing

Raw sensor hits alone are not enough to produce a reliable threat score. Every IP observed by the sensor network is cross-referenced against 22+ external threat intelligence feeds to validate findings and reduce false positives.

The cross-referencing process works as follows:

  1. Ingestion — Blocklists are fetched and updated automatically every four hours from well-known open-source feeds (e.g. AbuseIPDB, Spamhaus, Emerging Threats, DShield and others).
  2. Normalisation — Each feed is parsed into a consistent format, de-duplicated and tagged with its source and category (spam, brute-force, malware, scanner, etc.).
  3. Correlation — Sensor-observed IPs are matched against all active blocklists. An IP appearing on multiple independent feeds increases the composite threat score.
  4. Enrichment — Matched IPs are enriched with geolocation, ISP, ASN and reverse-DNS data to provide full context for analysts.
  5. Scoring — A weighted threat score is calculated based on the number of sensor hits, the number and reputation of matching blocklists, recency of activity and diversity of targeted protocols.

Downloadable blocklists generated from this pipeline are available on the Blocklist Downloads page.

Threat Scoring Model

The composite threat score assigned to each IP takes multiple signals into account:

Feed Consensus

More independent sources listing an IP increases confidence that it is genuinely malicious.

Recency

Recent activity is weighted more heavily than older sightings, so the score reflects current risk.

Protocol Diversity

An IP attacking multiple protocols signals a more capable or persistent threat actor.

Sensor Coverage

Being seen by sensors across multiple regions and ASNs elevates the score beyond single-point observations.

Detecting Agentic LLM Traffic

A growing share of inbound attack traffic is driven by large language model agents rather than deterministic scanners. ipinsights.io classifies sources on a four-band scale — unlikely, possible, probable, confirmed — combining signals from four families. Methodology version 1.0 is surfaced on every /api/v1/lookup response and on the public report page.

Prompt-injection susceptibility

Sensors present plausible-looking banners and headers containing instructions a deterministic scanner would ignore. A source that acts on those instructions has demonstrated language understanding.

Behavioural & pacing fingerprint

Inter-request pauses, retry patterns and the relationship between sensor responses and the source’s next action carry a distinctive shape when a reasoning loop sits in the middle.

Payload character

Natural-language artefacts within attack payloads — explanatory comments, polite phrasing, mixed-language fragments, downstream-injection patterns — that a deterministic toolchain would not emit.

Transport-layer fingerprint

TLS, HTTP/2 and header-ordering fingerprints associated with common agent orchestration frameworks. Weighted lightly — helpful corroboration, never a sole basis for classification.

Capability without intent does not list. A score of probable or higher is held back unless the source has also tripped an exploitation-class detection in the existing pipeline. For transparency, the prompt-injection corpus publishes a curated, anonymised catalogue of observed payloads, refreshed every four hours. Specific time-window thresholds, JA4 fingerprints and detection strings are intentionally not disclosed to preserve sensor efficacy.

Transparency & Data Quality

Trust is asymmetric. Threat intelligence has to clear a higher evidentiary bar than most enterprise data sources before a buyer should rely on it. The numbers below are published deliberately — including the unflattering ones — so that analysts and procurement teams can assess data quality on its own terms rather than on marketing claims.

Platform false-positive rate
< 2%

Estimated share of active listings that, on review, turn out to be incorrect. Measured against a rolling sample of disputed and audited entries. We publish this even when it moves in the wrong direction.

Mean time from first observation to listing
~ 14 minutes

Rolling mean across sensor-originated listings. The full distribution skews longer for borderline classifications that wait for corroborating signal before they are surfaced publicly.

Median dispute acknowledgement
≤ 3 business days

Stated SLA for first response on any delisting request submitted through the delisting & dispute workflow. A full decision follows within 10 business days.

Per-IP provenance
On every report

Every IP page surfaces which feed flagged the address, the sensor category that observed it, when it was first listed and when it was last refreshed — so analysts can sanity-check before they act.

Numbers are intentionally conservative and refreshed periodically as the sensor estate and feed mix evolve. If you believe a listing is wrong, the delisting & dispute workflow is the formal route to challenge it.

Threat data is continuously ingested from sensors in near real-time. External blocklists are refreshed every four hours via automated jobs. Stale entries are aged out and removed so that the platform reflects the current threat landscape rather than historical noise.

For more information about the project or to provide feedback on data quality, visit the About or Support page.