Dataset · Vector

Data Overview

Primary Platform Telegram

Content Type Public Channels

Output Format Structured AAT Frames

Access Open / On Request

What's in the Dataset

The Vector dataset contains two primary layers of data:

Raw Message Data

Message text (original language)
Channel metadata (name, subscriber count, creation date)
Timestamps and message IDs
Forwarding chains and source attribution
View counts and engagement metrics (where available)

Extracted Frames

Actor-action-target (AAT) triples per message
Entity types and normalized entity names
Sentiment and stance labels
Narrative cluster assignments
Confidence scores for each extraction

Schema

Each record in the processed dataset follows this structure:

{
  "message_id": "ch_12345_msg_67890",
  "channel": "example_channel",
  "timestamp": "2025-01-15T14:32:00Z",
  "text": "Original message text...",
  "language": "ru",
  "forwarded_from": "source_channel",
  "frames": [
    {
      "actor": "NATO",
      "action": "expanding",
      "target": "Eastern Europe",
      "sentiment": "negative",
      "confidence": 0.87
    }
  ],
  "narrative_cluster": "nato_expansion_threat",
  "views": 15420
}

Access & Licensing

The Vector dataset is available under a research-use license. Access is provided in two tiers:

Open Access

Aggregated statistics, narrative cluster summaries, and anonymized trend data are freely available through the dashboard and published reports.

Research Access

Full message-level data with extracted frames is available to verified researchers, journalists, and institutions upon request.

To request research access, reach out via the contact page with a brief description of your intended use case.

Responsible Use

The dataset is intended for research, journalism, and counter-disinformation purposes only.
Redistribution of raw data requires prior written consent.
Users must not use the data to target, harass, or dox individuals.
Citations should reference the Vector project and dataset version.