Editor Documentation

A detailed review of how to configure any of the nodes in our editor

This document explains how each of the tools in our editors work. Please consult the table below to jump to any particular reference point.

Table of Contents

General Structure

A manifest is defined as a JSON object with two main sections:

  • filter: Contains conditions that specify when the algorithm should classify a record as a match. Each condition type has its own syntax.
  • models: Defines the machine learning models and associated feature modules used by model_probability.

Example Manifest

{
  "filter": {
    "and": [
      {
        "attribute_compare": [
          "embed.external.uri",
          "==",
          "https://www.youtube.com/watch?v=E8Ew6K0W3RY"
        ]
      },
      {
        "regex_matches": ["text", "\\bthe\\b"]
      },
      {
        "regex_negation_matches": ["text", "\\bunwanted_term\\b"]
      },
      {
        "user_network": ["devingaffney.com", "in", "follows"]
      },
      {
        "text_similarity": [
          "text",
          {
            "anchor_text": "This is an important update",
            "model_name": "all-MiniLM-L6-v2"
          },
          ">=",
          0.3
        ]
      },
      {
        "model_probability": [
          {
            "model_name": "news_without_science"
          },
          ">=",
          0.9
        ]
      }
    ]
  },
  "models": [
    {
      "feature_modules": [
        {
          "type": "time_features"
        },
        {
          "model_name": "all-MiniLM-L6-v2",
          "type": "vectorizer"
        },
        {
          "type": "post_metadata"
        }
      ],
      "model_name": "news_without_science"
    }
  ]
}

Filter Section

The filter section defines the logical structure of the filtering criteria. Each criterion checks specific attributes, matches patterns, or evaluates machine learning models based on record data.

1. Attribute Comparison

The attribute_compare operation allows comparing an attribute of a record to a target value.

  • Syntax:
{
    "attribute_compare": [
        "<selector>",
        "<operator>",
        "<target_value>"
    ]
}
  • Fields:
    • selector: Specifies the JSONPath-like path to the attribute in the record.
    • <operator>: Comparison operator (e.g., ==, >, >=, <, <=).
    • <target_value>: The target value to compare the attribute against.
  • Example:
{
    "attribute_compare": [
        "post.blah.foo",
        "==",
        "bar"
    ]
}

2. Embed Type

The embed_type is a shorthand to explicitly select or reject posts based on the nature of embedded content.

  • Syntax:
{
    "embed_type": [
        "<equality_operator>",
        "<embed_name>",
    ]
}
  • Fields:
    • <equality_operator>: Either != or ==.
    • <embed_name>: Type of Embed to select or reject upon - any of image, link, post, image_group, video, gif.
  • Example:
{
    "embed_type": [
        "==",
        "video"
    ]
}

3. Entity Match and Exclude

The entity_matches and entity_excludes allows you to select or reject posts based on whether they match a set of passed-along values for a given entity type (urls, domains, mentions, or hashtags).

  • Syntax:
{
    "entity_matches": [
        "<entity_type>",
        "<entity_values>",
    ]
}
  • Fields:
    • <entity_type>: Either langs, urls, domains, mentions, or hashtags
    • <entity_values>: A list of values to match against - for langs, their ISO 639-1 codes, for users, their usernames (i.e. not a DID - we resolve internally) OR starter pack / list URLs, or hashtags without # symbol, or simple URLs
  • Example:
{
    "entity_matches": [
        "hashtags",
        ["foo", "bar", "baz"]
    ]
}

4. Regular Expression Matching

The regex_matches, regex_negation_matches, regex_any, and regex_none operations match or negate a regular expression pattern in an attribute's value.

  • Syntax:
{
    "regex_matches": [
        "<var>",
        "<regex_pattern>",
        "<case_insensitivity>"
    ]
}
  • Fields:
    • var: Specifies the JSONPath-like path to the attribute in the record.
    • <regex_pattern>: A regular expression pattern to match against.
    • <case_insensitivity>: Whether or not the regex will operate as case-insensitive - when true it will check any casing.
  • Examples:
{
    "regex_matches": [
        "text",
        "\\bthe\\b",
        false
    ]
}

{
    "regex_negation_matches": [
        "text",
        "\\bunwanted_term\\b",
        true
    ]
}

For regex_any or regex_none, we check the <var> operate against a list of terms to check and only return true or false if any or none of the terms are present - think of it as a shorthand for not having to write a joined or regex.

  • Syntax:
{
    "regex_any": [
        "<var>",
        ["<list>", "<of>", "<terms>"],
        "<case_insensitivity>"
    ]
}
  • Fields:
    • var: Specifies the JSONPath-like path to the attribute in the record.
    • <term_list>: A set of terms to join together in an or-like check in a regex.
    • <case_insensitivity>: Whether or not the regex will operate as case-insensitive - when true it will check any casing.
  • Examples:
{
    "regex_any": [
        "text",
        ["apple", "banana", "peach"],
        true
    ]
}

{
    "regex_none": [
        "text",
        ["apple", "banana", "peach"],
        false
    ]
}

5. Text Similarity

The text_similarity operation evaluates the similarity between the text in an attribute and an anchor text using a transformer model.

  • Syntax:
{
    "text_similarity": [
        "<attribute_path>",
        {
            "anchor_text": "<reference_text>",
            "model_name": "<transformer_model_name>"
        },
        "<operator>",
        "<threshold>"
    ]
}
  • Fields:
    • var: Path to the text attribute in the record.
    • anchor_text: The reference text to compare.
    • model_name: The name of the transformer model used for embeddings.
    • <operator>: Comparison operator, typically >= for similarity.
    • <threshold>: The similarity threshold.
  • Example:
{
    "text_similarity": [
        "text",
        {
            "anchor_text": "This is an important update",
            "model_name": "all-MiniLM-L6-v2"
        },
        ">=",
        0.3
    ]
}

6. Model Probability

The model_probability operation evaluates the likelihood that a record matches a specific classification using an XGBoost model.

  • Syntax:jsonCopy code
{
    "model_probability": [
        {
            "model_name": "<xgboost_model_name>"
        },
        "<operator>",
        "<threshold>"
    ]
}
  • Fields:
    • model_name: The name of the XGBoost model used for classification.
    • <operator>: Comparison operator (e.g., >= for probability).
    • <threshold>: Probability threshold to determine if the record meets the condition.
  • Example:
{
    "model_probability": [
        {
            "model_name": "news_without_science"
        },
        ">=",
        0.9
    ]
}

7. Social Graph

The social_graph operation evaluates the inclusion or exclusion of user dids based on a source actor and a direction. Note that when using this, if you do not specify an author to act upon, we will use API requests from your signed-in account.

  • Syntax:
{
    "social_graph": [
        "<username>",
        "<operator>",
        "<direction>"
    ]
}
  • Fields:
    • username: The username to pull followers/follows from.
    • <operator>: either in or not_in.
    • <direction>: either follows (i.e. users that username follows) or followers (i.e. users that username is followed by)
  • Example:
{
    "social_graph": [
        "devingaffney.com",
        "in",
        "follows"
    ]
}

8. Social List

The social_list allows you to specify the did's for a set of users to select/reject based on that list explicitly (i.e. if you don't want to just shorthand through a user account).

  • Syntax:
{
    "social_list": [
        "<did_list>",
        "<operator>"
    ]
}
  • Fields:
    • did_list: The list of user did's to pull from
    • <operator>: either in or not_in.
  • Example:
{
    "social_list": [
        ["did:plc:ngokl2gnmpbvuvrfckja3g7p"],
        "in"
    ]
}

9. Starter Pack Member

The starter_pack_member allows you to specify the URL for a starter pack of users to select/reject based on that list.

  • Syntax:


{
    "starter_pack_member": [
        "<starter_pack_url>",
        "<operator>"
    ]
}
  • Fields:
    • starter_pack_url: The starter pack URL
    • <operator>: either in or not_in.
  • Example:
{
    "starter_pack_member": [
        "https://bsky.app/starter-pack/propublica.org/3l6iflmcj322n",
        "in"
    ]
}

10. List Member

The list_member allows you to specify the URL for a list of users to select/reject based on that list.

  • Syntax:
{
    "list_member": [
        "<list_url>",
        "<operator>"
    ]
}
  • Fields:
    • list_url: The list URL
    • <operator>: either in or not_in.
  • Example:
{
    "list_member": [
        "https://bsky.app/profile/numb.comfortab.ly/lists/3lam62tvlqz2l",
        "in"
    ]
}

11. Magic Audience

The magic_audience allows you to specify the ID of one of your magic audiences to select/reject based on that list.

  • Syntax:
{
    "magic_audience": [
        "<audience_id>",
        "<operator>"
    ]
}
  • Fields:
    • audience_id: The Magic Audience ID
    • <operator>: either in or not_in.
  • Example:
{
    "list_member": [
        "42",
        "in"
    ]
}

12. Content Moderation

The content_moderation filter allows you to only accept/reject content matches your filter level for a given prediction. We use KoalaAI/Text-Moderation to scan post texts and classify them with our content moderation pipeline - currently it is English only but we intend to allow for multilingual predictions in the future. The current moderation categories we support are as follows:

CategoryDefinitionsexualContent meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).hateContent that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.violenceContent that promotes or glorifies violence or celebrates the suffering or humiliation of others.harassmentContent that may be used to torment or annoy individuals in real life, or make harassment more likely to occur.self-harmContent that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.sexual/minorsSexual content that includes an individual who is under 18 years old.hate/threateningHateful content that also includes violence or serious harm towards the targeted group.violence/graphicViolent content that depicts death, violence, or serious physical injury in extreme graphic detail.OKNot offensive

To use a moderation filter, provide the category, an operator, and a target_value as follows:

  • Syntax:
{
    "content_moderation": [
        "<moderation_category>",
        "<operator>",
        "<target_value>"
    ]
}
  • Fields:
    • moderation_category: The KoalaAI/Text-Moderation moderation category.
    • <operator>: Comparison operator (e.g., ==, >, >=, <, <=).
    • <target_value>: The target value to compare the attribute against.
  • Example:
{
    "content_moderation": [
        "OK",
        ">=",
        0.9
    ]
}

Models Section

The models section defines machine learning models used in model_probability. Each model entry specifies the model name, feature modules, and configuration. Currently, the only model provided is news_without_science, an XGBoost classifier trained on ≈100 news article skeets and ≈100 science-based skeets. In the guts of this codebase is the ability to train new models, but its very early. Expect (a) lots of ML modules to be made available over time and (b) the ability to easily train and deploy modules yourself via the site.

  • Fields:
    • model_name: The unique name of the model, referenced in model_probability.
    • feature_modules: An array defining the feature extraction modules for the model.
      • type: The type of feature (e.g., "time_features", "post_metadata").
      • model_name: (Optional) Model used for vectorizing, typically with type "vectorizer".
  • Example:
{
  "models": [
    {
      "feature_modules": [
        {
          "type": "time_features"
        },
        {
          "model_name": "all-MiniLM-L6-v2",
          "type": "vectorizer"
        },
        {
          "type": "post_metadata"
        }
      ],
      "model_name": "news_without_science"
    }
  ]
}

Comparator Reference

In the LogicEvaluator class, comparisons between values are handled by the compare method, which supports several common operators. Each operator is used to compare a given value to a specified threshold. Here's a breakdown of how each comparator works:

Available Comparators

1. Equality (==)

  • Description: Checks if value is equal to threshold.
  • Usage: Use this comparator when you want an exact match.
  • Example: If value == 10 and threshold == 10, the result is True.
  • Code: value == threshold

2. Inequality (!=)

  • Description: Checks if value is not equal to threshold.
  • Usage: Use this comparator when you want an exact non-match.
  • Example: If value == 10 and threshold == 10, the result is False.
  • Code: value != threshold

2. Greater Than or Equal (>=)

  • Description: Checks if value is greater than or equal to threshold.
  • Usage: Use this comparator to ensure value meets or exceeds a minimum requirement.
  • Example: If value == 10 and threshold == 5, the result is True. If value == 5 and threshold == 5, the result is also True.
  • Code: value >= threshold

3. Less Than or Equal (<=)

  • Description: Checks if value is less than or equal to threshold.
  • Usage: Use this comparator to ensure value does not exceed a maximum limit.
  • Example: If value == 3 and threshold == 5, the result is True. If value == 5 and threshold == 5, the result is also True.
  • Code: value <= threshold

4. Greater Than (>)

  • Description: Checks if value is strictly greater than threshold.
  • Usage: Use this comparator when value must be higher than threshold.
  • Example: If value == 10 and threshold == 5, the result is True. If value == 5 and threshold == 5, the result is False.
  • Code: value > threshold

5. Less Than (<)

  • Description: Checks if value is strictly less than threshold.
  • Usage: Use this comparator when value must be lower than threshold.
  • Example: If value == 3 and threshold == 5, the result is True. If value == 5 and threshold == 5, the result is False.
  • Code: value < threshold

6. Inclusion (in)

  • Description: Checks if value is contained within threshold.
  • Usage: Use this comparator when value present in threshold.
  • Example: If value == 3 and threshold == [3,4], the result is True.
  • Code: value in threshold

6. Exclusion (not_in)

  • Description: Checks if value is contained within threshold.
  • Usage: Use this comparator when value present in threshold.
  • Example: If value == 3 and threshold == [2,4], the result is True.
  • Code: value not in threshold