Editor Documentation
This document explains how each of the tools in our editors work. Please consult the table below to jump to any particular reference point.
Table of Contents
- General Structure
- Example Manifest
- Filter Section
- 1. Attribute Comparison
- 2. Embed Type
- 3. Entity Match and Exclude
- 4. Regular Expression Matching
- 5. Text Similarity
- 6. Model Probability
- 7. Social Graph
- 8. Social List
- 9. Starter Pack Member
- 10. List Member
- 11. Magic Audiences
- 12. Content Moderation & ML Classifiers
- Models Section
- Comparator Reference
- Available Comparators
General Structure
A manifest is defined as a JSON object with two main sections:
filter
: Contains conditions that specify when the algorithm should classify a record as a match. Each condition type has its own syntax.models
: Defines the machine learning models and associated feature modules used bymodel_probability
.
Example Manifest
{
"filter": {
"and": [
{
"attribute_compare": [
"embed.external.uri",
"==",
"https://www.youtube.com/watch?v=E8Ew6K0W3RY"
]
},
{
"regex_matches": ["text", "\\bthe\\b"]
},
{
"regex_negation_matches": ["text", "\\bunwanted_term\\b"]
},
{
"user_network": ["devingaffney.com", "in", "follows"]
},
{
"text_similarity": [
"text",
{
"anchor_text": "This is an important update",
"model_name": "all-MiniLM-L6-v2"
},
">=",
0.3
]
},
{
"model_probability": [
{
"model_name": "news_without_science"
},
">=",
0.9
]
}
]
},
"models": [
{
"feature_modules": [
{
"type": "time_features"
},
{
"model_name": "all-MiniLM-L6-v2",
"type": "vectorizer"
},
{
"type": "post_metadata"
}
],
"model_name": "news_without_science"
}
]
}
Filter Section
The filter
section defines the logical structure of the filtering criteria. Each criterion checks specific attributes, matches patterns, or evaluates machine learning models based on record data.
1. Attribute Comparison
The attribute_compare
operation allows comparing an attribute of a record to a target value.
- Syntax:
{
"attribute_compare": [
"<selector>",
"<operator>",
"<target_value>"
]
}
- Fields:
selector
: Specifies the JSONPath-like path to the attribute in the record.<operator>
: Comparison operator (e.g.,==
,>
,>=
,<
,<=
).<target_value>
: The target value to compare the attribute against.
- Example:
{
"attribute_compare": [
"post.blah.foo",
"==",
"bar"
]
}
2. Embed Type
The embed_type
is a shorthand to explicitly select or reject posts based on the nature of embedded content.
- Syntax:
{
"embed_type": [
"<equality_operator>",
"<embed_name>",
]
}
- Fields:
<equality_operator>
: Either!=
or==
.<embed_name>
: Type of Embed to select or reject upon - any ofimage
,link
,post
,image_group
,video
,gif
.
- Example:
{
"embed_type": [
"==",
"video"
]
}
3. Entity Match and Exclude
The entity_matches
and entity_excludes
allows you to select or reject posts based on whether they match a set of passed-along values for a given entity type (urls
, domains
, mentions
, or hashtags
).
- Syntax:
{
"entity_matches": [
"<entity_type>",
"<entity_values>",
]
}
- Fields:
<entity_type>
: Eitherlangs
,urls
,domains
,mentions
, orhashtags
<entity_values>
: A list of values to match against - for langs, their ISO 639-1 codes, for users, their usernames (i.e. not a DID - we resolve internally) OR starter pack / list URLs, or hashtags without # symbol, or simple URLs
- Example:
{
"entity_matches": [
"hashtags",
["foo", "bar", "baz"]
]
}
4. Regular Expression Matching
The regex_matches
, regex_negation_matches
, regex_any
, and regex_none
operations match or negate a regular expression pattern in an attribute's value.
- Syntax:
{
"regex_matches": [
"<var>",
"<regex_pattern>",
"<case_insensitivity>"
]
}
- Fields:
var
: Specifies the JSONPath-like path to the attribute in the record.<regex_pattern>
: A regular expression pattern to match against.<case_insensitivity>
: Whether or not the regex will operate as case-insensitive - whentrue
it will check any casing.
- Examples:
{
"regex_matches": [
"text",
"\\bthe\\b",
false
]
}
{
"regex_negation_matches": [
"text",
"\\bunwanted_term\\b",
true
]
}
For regex_any
or regex_none
, we check the <var>
operate against a list of terms to check and only return true or false if any or none of the terms are present - think of it as a shorthand for not having to write a joined or
regex.
- Syntax:
{
"regex_any": [
"<var>",
["<list>", "<of>", "<terms>"],
"<case_insensitivity>"
]
}
- Fields:
var
: Specifies the JSONPath-like path to the attribute in the record.<term_list>
: A set of terms to join together in an or-like check in a regex.<case_insensitivity>
: Whether or not the regex will operate as case-insensitive - whentrue
it will check any casing.
- Examples:
{
"regex_any": [
"text",
["apple", "banana", "peach"],
true
]
}
{
"regex_none": [
"text",
["apple", "banana", "peach"],
false
]
}
5. Text Similarity
The text_similarity
operation evaluates the similarity between the text in an attribute and an anchor text using a transformer model.
- Syntax:
{
"text_similarity": [
"<attribute_path>",
{
"anchor_text": "<reference_text>",
"model_name": "<transformer_model_name>"
},
"<operator>",
"<threshold>"
]
}
- Fields:
var
: Path to the text attribute in the record.anchor_text
: The reference text to compare.model_name
: The name of the transformer model used for embeddings.<operator>
: Comparison operator, typically>=
for similarity.<threshold>
: The similarity threshold.
- Example:
{
"text_similarity": [
"text",
{
"anchor_text": "This is an important update",
"model_name": "all-MiniLM-L6-v2"
},
">=",
0.3
]
}
6. Model Probability
The model_probability
operation evaluates the likelihood that a record matches a specific classification using an XGBoost model.
- Syntax:jsonCopy code
{
"model_probability": [
{
"model_name": "<xgboost_model_name>"
},
"<operator>",
"<threshold>"
]
}
- Fields:
model_name
: The name of the XGBoost model used for classification.<operator>
: Comparison operator (e.g.,>=
for probability).<threshold>
: Probability threshold to determine if the record meets the condition.
- Example:
{
"model_probability": [
{
"model_name": "news_without_science"
},
">=",
0.9
]
}
7. Social Graph
The social_graph
operation evaluates the inclusion or exclusion of user dids based on a source actor and a direction. Note that when using this, if you do not specify an author to act upon, we will use API requests from your signed-in account.
- Syntax:
{
"social_graph": [
"<username>",
"<operator>",
"<direction>"
]
}
- Fields:
username
: The username to pull followers/follows from.<operator>
: eitherin
ornot_in
.<direction>
: eitherfollows
(i.e. users thatusername
follows) orfollowers
(i.e. users thatusername
is followed by)
- Example:
{
"social_graph": [
"devingaffney.com",
"in",
"follows"
]
}
8. Social List
The social_list
allows you to specify the did's for a set of users to select/reject based on that list explicitly (i.e. if you don't want to just shorthand through a user account).
- Syntax:
{
"social_list": [
"<did_list>",
"<operator>"
]
}
- Fields:
did_list
: The list of user did's to pull from<operator>
: eitherin
ornot_in
.
- Example:
{
"social_list": [
["did:plc:ngokl2gnmpbvuvrfckja3g7p"],
"in"
]
}
9. Starter Pack Member
The starter_pack_member
allows you to specify the URL for a starter pack of users to select/reject based on that list.
- Syntax:
{
"starter_pack_member": [
"<starter_pack_url>",
"<operator>"
]
}
- Fields:
starter_pack_url
: The starter pack URL<operator>
: eitherin
ornot_in
.
- Example:
{
"starter_pack_member": [
"https://bsky.app/starter-pack/propublica.org/3l6iflmcj322n",
"in"
]
}
10. List Member
The list_member
allows you to specify the URL for a list of users to select/reject based on that list.
- Syntax:
{
"list_member": [
"<list_url>",
"<operator>"
]
}
- Fields:
list_url
: The list URL<operator>
: eitherin
ornot_in
.
- Example:
{
"list_member": [
"https://bsky.app/profile/numb.comfortab.ly/lists/3lam62tvlqz2l",
"in"
]
}
11. Magic Audience
The magic_audience
allows you to specify the ID of one of your magic audiences to select/reject based on that list.
- Syntax:
{
"magic_audience": [
"<audience_id>",
"<operator>"
]
}
- Fields:
audience_id
: The Magic Audience ID<operator>
: eitherin
ornot_in
.
- Example:
{
"list_member": [
"42",
"in"
]
}
12. Content Moderation & ML Classifiers
The content_moderation
filter allows you to only accept/reject content matches your filter level for a given prediction. We use KoalaAI/Text-Moderation to scan post texts and classify them with our content moderation pipeline - currently it is English only but we intend to allow for multilingual predictions in the future. The current moderation categories we support are as follows:

To use a moderation filter, provide the category, an operator, and a target_value
as follows:
- Syntax:
{
"content_moderation": [
"<moderation_category>",
"<operator>",
"<target_value>"
]
}
- Fields:
moderation_category
: The KoalaAI/Text-Moderation moderation category.<operator>
: Comparison operator (e.g.,==
,>
,>=
,<
,<=
).<target_value>
: The target value to compare the attribute against.
- Example:
{
"content_moderation": [
"OK",
">=",
0.9
]
}
Language Analysis
- Operator Name:
language_analysis
- Operator Type:
machine_learning
- Operator Description: Advanced Language Detection for a wide range of languages.
- Supported Languages:
'Japanese', 'Dutch', 'Arabic', 'Polish', 'German', 'Italian', 'Portuguese', 'Turkish', 'Spanish', 'Hindi', 'Greek', 'Urdu', 'Bulgarian', 'English', 'French', 'Chinese', 'Russian', 'Thai', 'Swahili', 'Vietnamese'
- Operator Parameters:
- Fields:
["language_name", "comparator", "threshold"]
- Example:
- Fields:
{
"language_analysis": [
"Dutch",
"<=",
0.5
]
}
- Model Link: XLM-RoBERTa Base Language Detection
Sentiment Analysis
- Operator Name:
sentiment_analysis
- Operator Type:
machine_learning
- Operator Description: Sentiment Analysis for classifying content into Positive, Negative, or Neutral categories.
- Supported Categories:
'Positive', 'Negative', 'Neutral'
- Operator Parameters:
- Fields:
["sentiment_category", "comparator", "threshold"]
- Example:
- Fields:
{
"sentiment_analysis": [
"Negative",
"<=",
0.5
]
}
- Model Link: Twitter RoBERTa Base Sentiment
Financial Sentiment Analysis
- Operator Name:
financial_sentiment_analysis
- Operator Type:
machine_learning
- Operator Description: Fine-tuned financial sentiment model for detecting Positive, Negative, or Neutral sentiment specific to financial content.
- Supported Categories:
'Positive', 'Negative', 'Neutral'
- Operator Parameters:
- Fields:
["financial_category", "comparator", "threshold"]
- Example:
- Fields:
{
"financial_sentiment_analysis": [
"Positive",
">=",
0.5
]
}
- Model Link: FinBERT
Emotional Sentiment Analysis
- Operator Name:
emotion_sentiment_analysis
- Operator Type:
machine_learning
- Operator Description: Tests for emotional score across a range of dimensions
- Supported Categories:
'Admiration', 'Amusement', 'Anger', 'Annoyance', 'Approval', 'Caring', 'Confusion', 'Curiosity', 'Desire', 'Disappointment', 'Disapproval', 'Disgust', 'Embarrassment', 'Excitement', 'Fear', 'Gratitude', 'Grief', 'Joy', 'Love', 'Nervousness', 'Optimism', 'Pride', 'Realization', 'Relief', 'Remorse', 'Sadness', 'Surprise', 'Neutral'
. - Operator Parameters:
- Fields:
["emotion_category", "comparator", "threshold"]
- Example:
- Fields:
{
"emotion_sentiment_analysis": [
"Curiosity",
">=",
0.5
]
}
- Model Link: GoEmotions RoBERTa Base
Toxicity Analysis
- Operator Name:
toxicity_analysis
- Operator Type:
machine_learning
- Operator Description: Tests for toxicity score across a range of dimensions
- Supported Categories:
'Toxic', 'Severe Toxicity', 'Obscene', 'Threat', 'Insult', 'Identity Hate'
. - Operator Parameters:
- Fields:
["toxic_category", "comparator", "threshold"]
- Example:
- Fields:
{
"toxicity_analysis": [
"Obscene",
"<=",
0.5
]
}
- Model Link: ToxicBert
Topic Analysis
- Operator Name:
topic_analysis
- Operator Type:
machine_learning
- Operator Description: Tests for topic-based scores across a range of dimensions
- Supported Categories:
'Arts & Culture', 'Business & Entrepreneurs', 'Celebrity & Pop Culture', 'Diaries & Daily Life', 'Family', 'Fashion & Style', 'Film, TV & Video', 'Fitness & Health', 'Food & Dining', 'Gaming', 'Learning & Educational', 'Music', 'News & Social Concern', 'Other Hobbies', 'Relationships', 'Science & Technology', 'Sports', 'Travel & Adventure', 'Youth & Student Life'
. - Operator Parameters:
- Fields:
["topic_label", "comparator", "threshold"]
- Example:
- Fields:
{
"topic_analysis": [
"Gaming",
"<=",
0.5
]
}
Arbitrary Text Analysis
- Operator Name:
text_arbitrary
- Operator Type:
machine_learning
- Operator Description: Arbitrary analysis is a special node in that you can define *any label you want*, and the model will assess the applicability of that label to that particular item (e.g. if you set the tag to "scotland" and the text is "I just went to Edinburgh, it was great", it will rate highly)
- Supported Categories: Anything you want!
- Operator Parameters:
- Fields:
["tag", "comparator", "threshold"]
- Example:
- Fields:
{
"text_arbitrary": [
"Gaming",
"<=",
0.5
]
}
- Model Link: mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
SFW/NSFW Image Analysis
- Operator Name:
image_nsfw
- Operator Type:
machine_learning
- Operator Description:
- Supported Categories:
NSFW, SFW
- Operator Parameters:
- Fields:
["tag", "comparator", "threshold"]
- Example:
- Fields:
{
"image_nsfw": [
"NSFW",
"<=",
0.5
]
}
- Model Link: Marqo/nsfw-image-detection-384
Arbitrary Image Analysis
- Operator Name:
image_arbitrary
- Operator Type:
machine_learning
- Operator Description: Filter images based on probability of any arbitrary rating - behaves similarity to the arbitrary text analysis node.
- Supported Categories: Anything you want!
- Operator Parameters:
- Fields:
["tag", "comparator", "threshold"]
- Example:
- Fields:
{
"image_arbitrary": [
"scotland",
"<=",
0.5,
0.0
]
}
- Model Link: openai/clip-vit-base-patch16
Models Section
The models
section defines machine learning models used in model_probability
. Each model entry specifies the model name, feature modules, and configuration. Currently, the only model provided is news_without_science
, an XGBoost classifier trained on ≈100 news article skeets and ≈100 science-based skeets. In the guts of this codebase is the ability to train new models, but its very early. Expect (a) lots of ML modules to be made available over time and (b) the ability to easily train and deploy modules yourself via the site.
- Fields:
model_name
: The unique name of the model, referenced inmodel_probability
.feature_modules
: An array defining the feature extraction modules for the model.type
: The type of feature (e.g.,"time_features"
,"post_metadata"
).model_name
: (Optional) Model used for vectorizing, typically with type"vectorizer"
.
- Example:
{
"models": [
{
"feature_modules": [
{
"type": "time_features"
},
{
"model_name": "all-MiniLM-L6-v2",
"type": "vectorizer"
},
{
"type": "post_metadata"
}
],
"model_name": "news_without_science"
}
]
}
Comparator Reference
In the LogicEvaluator
class, comparisons between values are handled by the compare
method, which supports several common operators. Each operator is used to compare a given value
to a specified threshold
. Here's a breakdown of how each comparator works:
Available Comparators
1. Equality (==
)
- Description: Checks if
value
is equal tothreshold
. - Usage: Use this comparator when you want an exact match.
- Example: If
value == 10
andthreshold == 10
, the result isTrue
. - Code:
value == threshold
2. Inequality (!=
)
- Description: Checks if
value
is not equal tothreshold
. - Usage: Use this comparator when you want an exact non-match.
- Example: If
value == 10
andthreshold == 10
, the result isFalse
. - Code:
value != threshold
2. Greater Than or Equal (>=
)
- Description: Checks if
value
is greater than or equal tothreshold
. - Usage: Use this comparator to ensure
value
meets or exceeds a minimum requirement. - Example: If
value == 10
andthreshold == 5
, the result isTrue
. Ifvalue == 5
andthreshold == 5
, the result is alsoTrue
. - Code:
value >= threshold
3. Less Than or Equal (<=
)
- Description: Checks if
value
is less than or equal tothreshold
. - Usage: Use this comparator to ensure
value
does not exceed a maximum limit. - Example: If
value == 3
andthreshold == 5
, the result isTrue
. Ifvalue == 5
andthreshold == 5
, the result is alsoTrue
. - Code:
value <= threshold
4. Greater Than (>
)
- Description: Checks if
value
is strictly greater thanthreshold
. - Usage: Use this comparator when
value
must be higher thanthreshold
. - Example: If
value == 10
andthreshold == 5
, the result isTrue
. Ifvalue == 5
andthreshold == 5
, the result isFalse
. - Code:
value > threshold
5. Less Than (<
)
- Description: Checks if
value
is strictly less thanthreshold
. - Usage: Use this comparator when
value
must be lower thanthreshold
. - Example: If
value == 3
andthreshold == 5
, the result isTrue
. Ifvalue == 5
andthreshold == 5
, the result isFalse
. - Code:
value < threshold
6. Inclusion (in)
- Description: Checks if
value
is contained withinthreshold
. - Usage: Use this comparator when
value
present inthreshold
. - Example: If
value == 3
andthreshold == [3,4]
, the result isTrue
. - Code:
value in threshold
6. Exclusion (not_in)
- Description: Checks if
value
is contained withinthreshold
. - Usage: Use this comparator when
value
present inthreshold
. - Example: If
value == 3
andthreshold == [2,4]
, the result isTrue
. - Code:
value not in threshold