Dataset
Vector maintains an open database of collected media content, extracted narrative frames, and channel metadata. This page describes the data sources, structure, and access methods.
Data Overview
What's in the Dataset
The Vector dataset contains two primary layers of data:
Raw Message Data
- Message text (original language)
- Channel metadata (name, subscriber count, creation date)
- Timestamps and message IDs
- Forwarding chains and source attribution
- View counts and engagement metrics (where available)
Extracted Frames
- Actor-action-target (AAT) triples per message
- Entity types and normalized entity names
- Sentiment and stance labels
- Narrative cluster assignments
- Confidence scores for each extraction
Schema
Each record in the processed dataset follows this structure:
{
"message_id": "ch_12345_msg_67890",
"channel": "example_channel",
"timestamp": "2025-01-15T14:32:00Z",
"text": "Original message text...",
"language": "ru",
"forwarded_from": "source_channel",
"frames": [
{
"actor": "NATO",
"action": "expanding",
"target": "Eastern Europe",
"sentiment": "negative",
"confidence": 0.87
}
],
"narrative_cluster": "nato_expansion_threat",
"views": 15420
} Access & Licensing
The Vector dataset is available under a research-use license. Access is provided in two tiers:
Open Access
Aggregated statistics, narrative cluster summaries, and anonymized trend data are freely available through the dashboard and published reports.
Research Access
Full message-level data with extracted frames is available to verified researchers, journalists, and institutions upon request.
To request research access, reach out via the contact page with a brief description of your intended use case.
Responsible Use
- The dataset is intended for research, journalism, and counter-disinformation purposes only.
- Redistribution of raw data requires prior written consent.
- Users must not use the data to target, harass, or dox individuals.
- Citations should reference the Vector project and dataset version.