Skip to main content
Version: Beta

GitHub

DiscoveryMCP

Discover and manage GitHub repositories and hosting services in your Devgraph knowledge graph.

Overview

The GitHub molecule connects to GitHub's API to automatically discover repositories within your organizations. It creates entities for repos and the GitHub hosting service itself, along with relationships between them.

Key Features:

  • Discover repositories from multiple organizations
  • Filter repos by name patterns (regex)
  • Support for GitHub App authentication (3x higher rate limits)
  • Read .devgraph.yaml files from repos for additional entities
  • Track repository metadata (languages, description, etc.)

Quick Start

providers:
- name: github
type: github
every: 300 # Reconcile every 5 minutes
config:
namespace: default
token: ${GITHUB_TOKEN}
selectors:
- organization: myorg
repo_name: ".*" # All repos

Configuration

Basic Configuration

providers:
- name: github-prod
type: github
every: 300
config:
namespace: production
base_url: https://github.com
api_url: https://api.github.com
token: ${GITHUB_TOKEN}
selectors:
- organization: mycompany
repo_name: "backend-.*"
graph_files:
- .devgraph.yaml
- organization: mycompany
repo_name: "frontend-.*"

GitHub Apps provide 3x higher rate limits (15,000/hour vs 5,000/hour with PAT):

providers:
- name: github-app
type: github
every: 300
config:
namespace: default
app_id: 123456
app_private_key: ${GITHUB_APP_PRIVATE_KEY}
installation_id: 789012
selectors:
- organization: myorg

Configuration Options

OptionTypeRequiredDefaultDescription
namespacestringNodefaultNamespace for created entities
base_urlstringNohttps://github.comGitHub web interface URL
api_urlstringNohttps://api.github.comGitHub API endpoint
tokenstringYes*-Personal Access Token
app_idintegerYes*-GitHub App ID
app_private_keystringYes*-GitHub App private key
installation_idintegerYes*-GitHub App installation ID
selectorslistYes[]Repository selectors

*Either token OR (app_id + app_private_key + installation_id) is required.

Selector Options

Each selector specifies which repos to discover:

OptionTypeRequiredDefaultDescription
organizationstringYes-GitHub organization name
repo_namestringNo.*Regex pattern for repo names
graph_fileslistNo[.devgraph.yaml]Files to parse for entities

Authentication

Personal Access Token (PAT)

Rate Limit: 5,000 requests/hour

  1. Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
  2. Click "Generate new token"
  3. Select scopes:
    • repo - For private repositories
    • public_repo - For public repositories only
    • read:org - To read organization data
  4. Copy token and set in config:
token: ${GITHUB_TOKEN}

Set environment variable:

export GITHUB_TOKEN="ghp_xxxxxxxxxxxx"

Rate Limit: 15,000 requests/hour

  1. Create GitHub App:
    • Go to Organization Settings → Developer settings → GitHub Apps
    • Click "New GitHub App"
    • Set permissions: Repository: Contents (Read-only), Organization: Members (Read-only)
    • Generate private key and download
  2. Install app on organization
  3. Note the App ID and Installation ID
  4. Configure:
app_id: 123456
app_private_key: |
-----BEGIN RSA PRIVATE KEY-----
... (your private key) ...
-----END RSA PRIVATE KEY-----
installation_id: 789012

Or use environment variables:

export GITHUB_APP_ID="123456"
export GITHUB_APP_PRIVATE_KEY="$(cat private-key.pem)"
export GITHUB_APP_INSTALLATION_ID="789012"

Entities Created

GithubHostingService

Represents the GitHub platform instance.

Entity Structure:

apiVersion: entities.devgraph.ai/v1
kind: GithubHostingService
metadata:
name: github
namespace: default
labels:
organization: github
spec:
api_url: https://api.github.com

One hosting service entity is created per provider.

GithubRepository

Represents a GitHub repository.

Entity Structure:

apiVersion: entities.devgraph.ai/v1
kind: GithubRepository
metadata:
name: my-service
namespace: default
labels:
owner: myorg
spec:
owner: myorg
name: my-service
url: https://github.com/myorg/my-service
description: "My awesome service"
languages:
Python: 15234
JavaScript: 8432

Fields:

  • owner: Organization or user that owns the repo
  • name: Repository name
  • url: Web URL to the repository
  • description: Repository description
  • languages: Map of language names to bytes of code

Relationships

HOSTED_BY

Every repository is linked to the hosting service.

GithubRepository --HOSTED_BY--> GithubHostingService

Example:

myorg/backend-api --HOSTED_BY--> github

Graph Files

The GitHub molecule can read .devgraph.yaml files from repositories to discover additional entities and relationships. This allows repos to declare their own metadata.

Example .devgraph.yaml:

entities:
- apiVersion: entities.devgraph.ai/v1
kind: Service
metadata:
name: backend-api
labels:
team: platform
spec:
type: rest-api
port: 8080

relations:
- source:
kind: Service
name: backend-api
target:
kind: GithubRepository
name: backend-api
relation: IMPLEMENTED_BY

Configure which files to read:

selectors:
- organization: myorg
graph_files:
- .devgraph.yaml
- .devgraph/entities.yaml

Filtering Repositories

Use regex patterns to filter repos:

Match All Repos

repo_name: ".*"

Match Prefix

repo_name: "backend-.*"  # backend-api, backend-worker, etc.

Match Suffix

repo_name: ".*-service"  # api-service, auth-service, etc.

Exclude Pattern

repo_name: "^(?!archive-).*"  # Exclude repos starting with "archive-"

Multiple Patterns

Use multiple selectors:

selectors:
- organization: myorg
repo_name: "backend-.*"
- organization: myorg
repo_name: "frontend-.*"

Use Cases

Multi-Organization Discovery

selectors:
- organization: company-backend
- organization: company-frontend
- organization: company-platform

Environment Separation

# Production
- name: github-prod
config:
namespace: production
selectors:
- organization: myorg
repo_name: "prod-.*"

# Staging
- name: github-staging
config:
namespace: staging
selectors:
- organization: myorg
repo_name: "staging-.*"

Team-Based Organization

selectors:
- organization: myorg
repo_name: "platform-.*"
graph_files: [.devgraph.yaml]
- organization: myorg
repo_name: "product-.*"
graph_files: [.devgraph.yaml]

Troubleshooting

Rate Limiting

Symptom: Logs show "GitHub API rate limit low" or 403 errors

Solutions:

  1. Switch to GitHub App authentication (3x higher limits)
  2. Increase reconciliation interval (every: 600 for 10 minutes)
  3. Reduce number of organizations/repos

Check rate limit:

curl -H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/rate_limit

Authentication Errors

Symptom: "401 Unauthorized" or "403 Forbidden"

Solutions:

  1. Verify token is correct and not expired
  2. Check token has required scopes (repo, read:org)
  3. For GitHub App: Verify app is installed on organization
  4. For GitHub App: Check private key format (must include BEGIN/END markers)

Missing Repositories

Symptom: Expected repos not appearing

Solutions:

  1. Check repo_name pattern matches repo names
  2. Verify token/app has access to repos (private repos need repo scope)
  3. Check organization name is correct
  4. Review logs for errors

Private Key Format Issues

Symptom: "Invalid private key format" error

Solution: Ensure private key has proper formatting:

app_private_key: |
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA...
...
-----END RSA PRIVATE KEY-----

Performance Tips

  1. Use GitHub Apps: 3x higher rate limits
  2. Optimize selectors: Filter at API level with specific org/repo patterns
  3. Adjust interval: Balance freshness vs. API usage
  4. Minimize graph_files: Only parse files you need

Integration Examples

Argo molecule can automatically link applications to GitHub repos:

# In Argo app, repoURL matches GitHub repo spec.url
ArgoApplication --USES--> GithubRepository

Vercel molecule creates relations based on Git URLs:

VercelProject --USES--> GithubRepository

Next Steps