Skip to main content
Version: Beta

Molecules Overview

Molecules are the core integration mechanism in Devgraph that connect your development ecosystem to the knowledge graph. Each molecule discovers entities and relationships from a specific platform or service.

What are Molecules?

Molecules are modular providers that:

  • Discover entities from external systems (GitHub repos, Argo apps, Grafana dashboards, etc.)
  • Create relationships between discovered entities
  • Reconcile state periodically to keep the graph up-to-date
  • Normalize data from different sources into a unified graph model

Available Molecules

MoleculePurposeEntities
GitHubSource code repositoriesRepositories, Hosting Services
GitLabGitLab project managementProjects, Hosting Services
Argo CDKubernetes deploymentsApplications, Projects, Instances
VercelFrontend deploymentsProjects, Deployments, Teams
LDAPIdentity & directoryUsers, Groups, Organizational Units
GrafanaObservability dashboardsDashboards, Datasources, Folders, Instances
JiraIssue trackingProjects, Issues, Users
DockerContainer registriesRepositories, Images
FOSSALicense complianceProjects, Issues

How Molecules Work

Discovery Phase

Each molecule connects to its target system's API and discovers resources:

GitHub API → Repositories → GitHub entities in Devgraph
Argo API → Applications → Argo entities in Devgraph

Relationship Creation

Molecules create relationships between entities, often using field-selected relations:

# Example: Argo app uses GitHub repo
ArgoApplication --USES--> GitHubRepository
(matched by spec.repoUrl field)

Reconciliation

Molecules run on a schedule (e.g., every 5 minutes) to:

  • Discover current state from external system
  • Compare with existing entities in the graph
  • Create new entities and relationships
  • Remove entities that no longer exist

Configuration

All molecules share a common configuration structure:

providers:
- name: my-github-provider
type: github # Molecule type
every: 300 # Reconciliation interval (seconds)
config:
namespace: default # Entity namespace
# Molecule-specific configuration...

Common Configuration Options

Every molecule supports:

  • namespace: Logical grouping for entities (default: "default")
  • every: How often to reconcile (in seconds)
  • name: Unique identifier for the provider instance

Most molecules include:

  • Authentication credentials (tokens, API keys)
  • API endpoints/URLs
  • Selectors (filtering which resources to discover)

Authentication

Molecules support various authentication methods:

MoleculeAuth MethodConfiguration
GitHubPersonal Access Tokentoken
GitHubGitHub Appapp_id, app_private_key, installation_id
GitLabPrivate Tokentoken
Argo CDAuth Tokentoken
GrafanaAPI Keyapi_key
LDAPBind DN + Passwordbind_dn, bind_password

Selectors

Many molecules use selectors to filter which resources to discover:

# GitHub: Select repos by organization and name pattern
selectors:
- organization: myorg
repo_name: "backend-.*"
graph_files: [".devgraph.yaml"]

# Grafana: Select dashboards by tags and folders
selectors:
- tags: ["production"]
folder_ids: [1, 5]

Entity Definitions

Each molecule defines entity types it can create. For example, the GitHub molecule creates:

  • GithubRepository (kind: GitHubRepository)
  • GithubHostingService (kind: GithubHostingService)

Entities follow a consistent structure:

apiVersion: entities.devgraph.ai/v1
kind: GithubRepository
metadata:
name: my-repo
namespace: default
labels:
owner: myorg
spec:
owner: myorg
name: my-repo
url: https://github.com/myorg/my-repo
description: "Repository description"

Relations

Molecules create typed relationships between entities:

RelationDescriptionExample
HOSTED_BYEntity hosted by serviceGitHubRepository HOSTED_BY GithubHostingService
USESEntity uses anotherArgoApplication USES GitHubRepository
MEMBER_OFUser belongs to groupLdapUser MEMBER_OF LdapGroup
DEPLOYSDeployment of projectVercelDeployment DEPLOYS VercelProject

Field-Selected Relations

Many relations use field selectors to match entities dynamically:

# Match Argo app to GitHub repo by URL
target_selector = "spec.url=https://github.com/org/repo"

This allows relationships to be created even when the target entity is managed by a different molecule.

Best Practices

1. Use Appropriate Namespaces

Separate environments or tenants using namespaces:

# Production
namespace: production

# Staging
namespace: staging

2. Set Reasonable Intervals

Balance freshness vs. API rate limits:

  • Fast-changing data: 60-300 seconds
  • Slow-changing data: 600-3600 seconds
  • Rate-limited APIs: Consider longer intervals

3. Use Selectors Wisely

Filter at the molecule level to reduce noise:

# Don't discover archived or fork repos
selectors:
- organization: myorg
repo_name: "^(?!archive-).*" # Exclude archived

4. Monitor API Rate Limits

Many APIs have rate limits. Use GitHub Apps instead of PATs for 3x higher limits.

5. Secure Credentials

Use environment variables for sensitive data:

token: ${GITHUB_TOKEN}
api_key: ${GRAFANA_API_KEY}

Troubleshooting

Authentication Failures

Symptom: "401 Unauthorized" or "403 Forbidden"

Solutions:

  • Verify token/credential is correct
  • Check token has required scopes/permissions
  • For GitHub: Use app authentication for higher rate limits

Rate Limiting

Symptom: "429 Too Many Requests"

Solutions:

  • Increase every interval
  • Use GitHub App authentication (15,000/hour vs 5,000/hour)
  • Implement caching where possible

Missing Entities

Symptom: Expected entities not appearing in graph

Solutions:

  • Check selector patterns match resources
  • Review molecule logs for errors
  • Verify API credentials have access to resources

Connection Timeouts

Symptom: "Connection timeout" or "Network unreachable"

Solutions:

  • Check network connectivity to API endpoint
  • Verify firewall rules allow outbound connections
  • Test API endpoint with curl/httpie

Creating Custom Molecules

Contact the Devgraph team for information on creating your own molecules to integrate additional systems.

Next Steps