Files
netstar-categorizer/claude.md
2026-04-23 17:41:18 -03:00

11 KiB

Claude Development Guide - NetStar Categorizer

Project Overview

NetStar Categorizer is a Node.js microservice that provides domain name/FQDN categorization using the inCompass NetStar SDK. It exposes both UDP and HTTP interfaces for real-time content classification with automatic daily database updates.

Core Purpose

  • Categorize domains into standardized categories using NetStar's content database
  • Map raw NetStar category IDs to organization-specific category codes
  • Provide both simple and detailed categorization results with reputation/age ratings
  • Maintain updated categorization databases through automated cron jobs

Key Architecture

Technology Stack

  • Runtime: Node.js v14+
  • Web Framework: Express.js v4
  • External SDK: inCompass NetStar v3.1.0-2 (C++ library for categorization)
  • Containerization: Docker + Kubernetes
  • CI/CD: CircleCI
  • Infrastructure: Cloud-agnostic (DigitalOcean, Google Cloud, etc.)

Core Services

  1. HTTP Server (Port 3333) - REST API for categorization requests
  2. UDP Server (Port 33333) - Legacy UDP interface for categorization
  3. Cron Service - Daily database updates via Kubernetes CronJob
  4. Category Mapping - NetStar ID → Organization Code conversion (singleton pattern)

Project Structure

src/
├── server.js                              # Entry point: initializes UDP + HTTP servers
├── app.js                                 # Core logic: orchestrates categorization flow
├── client.js                              # UDP test client
├── cron.js                                # Scheduled database update logic
├── use-cases/
│   ├── get-category-use-case.js          # Executes NetStar gcf1check command
│   ├── category-converter-use-case.js    # Maps NetStar IDs → org codes (singleton)
│   ├── parse-detailed-category-use-case.js # Parses detailed output with ratings
│   └── update-categories-use-case.js     # Updates NetStar databases
└── etc/
    └── categories-mapping.json           # Mapping table: NetStar ID → Zvelo codes

deployment/
├── deployment.yaml                       # Kubernetes deployment manifest
└── staging/deployment.yaml               # Staging-specific config

.circleci/config.yml                      # CI/CD pipeline (builds, tests, deploys)
Dockerfile                                # Container build specification
package.json                              # Node dependencies + scripts
makefile                                  # Convenience commands
test-detailed.http                        # HTTP endpoint test file

Language Standards

Comments and Error Messages - ENGLISH ONLY

All code comments, error messages, log statements, and documentation must be written in English.

This includes:

  • Code comments explaining logic
  • Error messages and exception messages
  • Console.log, console.error, and logging statements
  • Variable and function names
  • Commit messages
  • Code review feedback
  • Documentation strings (JSDoc, etc.)

Examples:

// Correct: English comment
function mapCategoryId(netstarId) {
  if (!netstarId) {
    throw new Error('NetStar ID is required')
  }
  // Map to organization category code
  return categoryConverter.convert(netstarId)
}

// Incorrect: Portuguese comment
function mapCategoryId(netstarId) {
  if (!netstarId) {
    throw new Error('ID do NetStar é obrigatório')  // ❌ ERROR IN PORTUGUESE
  }
  // Mapear para código de categoria da organização  // ❌ COMMENT IN PORTUGUESE
  return categoryConverter.convert(netstarId)
}

Coding Conventions

Code Style

  • Use ES6 syntax (const/let, arrow functions, template literals)
  • No semicolons in new code (already established pattern)
  • Functional/modular design - keep files focused on single responsibility
  • Use singleton pattern for shared state (see: CategoryConverterUseCase)
  • All comments in English - see Language Standards section

Use Case Pattern

  • Each business operation gets a dedicated use case class in src/use-cases/
  • Use case classes should have a clear, single responsibility
  • Example:
    class GetCategoryUseCase {
      async execute(fqdn) {
        // implementation
      }
    }
    

Environment Variables

  • Defined in .env (create from .env.example)
  • UDP_PORT=33333 - UDP server listen port
  • HTTP_PORT=3333 - HTTP server listen port

HTTP API Endpoints

POST / - Basic Categorization

  • Input: {"fqdn": "example.com"}
  • Output: {"result": [10009, 10010]} (array of category IDs)
  • Use Case: Quick lookups when detailed info not needed

POST /detailed - Detailed Categorization

  • Input: {"fqdn": "example.com"}
  • Output: Full category info with reputation score, age rating, primary/secondary categories, and human-readable names
  • Use Case: Comprehensive categorization for security decisions
  • Includes: result array matching the / endpoint format

POST /full - Complete Raw Categorization Output

  • Input: {"fqdn": "example.com"}
  • Output: Complete detailed structure with all 35 fields from NetStar including:
    • Primary, secondary, and security categories (with IDs, names, and mapped IDs)
    • Reputation score and name
    • Matching flags and their descriptions
    • Age rating score and name
    • All 9 category group classifications (Internet/Infrastructure, Malware/Security, Dangerous/Harmful, Adult, Business/Government, Personal, Computing/Technology, Social Media, Miscellaneous)
    • Volume index
    • Submitted URL
  • Use Case: Complete diagnostic and analysis when all categorization data is needed
  • Includes: result array matching the / endpoint format

UDP Server (Port 33333)

  • Input: Raw domain string (e.g., "example.com")
  • Output: JSON-formatted result (same as HTTP / endpoint)
  • Legacy Interface: Maintained for backward compatibility

Development Workflow

Setup & Running Locally

# Install dependencies
npm install

# Development with auto-reload
npm run dev:server    # Watch mode for server changes
npm run dev:client    # Run UDP client for testing

# Production
npm start             # Start both servers

# NetStar service commands (Linux system)
make gcf1-start       # Start NetStar service
make gcf1-download    # Download category databases
make gcf1-update      # Update category databases

Testing APIs

Use test-detailed.http in VS Code REST Client extension:

  1. Open the file
  2. Click "Send Request" on each endpoint
  3. View responses in the side panel

Git Workflow

  • Main Branch: main - production stable code
  • Development Branch: development - feature integration
  • Feature Branches: Create from development, merge back via PR
  • Recent commits show HTTP approach implementation and cron job additions
  • Commit Messages: Must be in English

Deployment

Docker

  • Base Image: Ubuntu 22.04
  • Includes: Boost libraries, Node.js, NetStar SDK
  • Exposes: Port 3000 (UDP)
  • Build: docker build -t netstar-categorizer .

Kubernetes (Production)

  • Namespace: blackdice
  • Deployment: Single replica in appropriate cluster
  • CronJob: Daily database updates at 00:00 UTC
  • Ingress: netstar-cat-dev.blackdice.ai (DNS varies by environment)
  • Branches to Environments:
    • development → development cluster
    • qa → QA cluster
    • staging → staging cluster
    • production → production cluster
    • gke-staging, gke-pov → specific GKE clusters

CI/CD Pipeline (CircleCI)

  • Automatically builds Docker image on push
  • Tags image with commit SHA
  • Deploys to appropriate Kubernetes cluster based on branch
  • Release deployments via git tags

Key Technical Details

Category Mapping System

  • Source: NetStar SDK returns numeric category IDs (e.g., 101, 102)
  • Mapping File: src/etc/categories-mapping.json
  • Target: Maps to organization's Zvelo pattern codes (e.g., 10075, 10078)
  • Singleton Implementation: CategoryConverterUseCase maintains single instance across app
  • Example Mapping:
    • NetStar 101 (Illegal Activities) → Zvelo 10075
    • NetStar 201 (Terrorism/Extremists) → Zvelo 10018

NetStar SDK Integration

  • Command: gcf1check - queries the NetStar database for domain categorization
  • Child Process: Executed via Node.js child_process module
  • Output Parsing: Raw output parsed into JSON structure
  • Detailed Mode: Includes reputation scores and age ratings in output

Automatic Database Updates

  • Mechanism: Kubernetes CronJob at 0 0 * * * (daily at midnight UTC)
  • Fallback: Manual update via make gcf1-update
  • Purpose: Keeps categorization database current with latest NetStar classifications

Common Tasks

Adding a New Endpoint

  1. Create a corresponding use case in src/use-cases/
  2. Add route in src/app.js that calls the use case
  3. Export and test in test-detailed.http
  4. Update this guide if it's a significant feature
  5. Ensure all error messages and comments are in English

Updating Category Mappings

  1. Modify src/etc/categories-mapping.json with new ID mappings
  2. Restart the service (singleton will reload on next request)
  3. Test with both HTTP and UDP interfaces

Debugging

  • Server Logs: Check Docker/Kubernetes logs for errors
  • Cron Logs: View Kubernetes CronJob logs for database update issues
  • UDP Testing: Use npm run dev:client to test directly
  • HTTP Testing: Use test-detailed.http with VS Code REST Client
  • Error Messages: All error logs must be in English

Troubleshooting

  • NetStar Service Not Running: Run make gcf1-start
  • Stale Categories: Manually run make gcf1-update or wait for cron job
  • Port Conflicts: Ensure ports 3333 (HTTP) and 33333 (UDP) are available
  • Docker Build Issues: Check that Boost C++ libraries are installed correctly

Current Development Status

Recent Work

  • HTTP server implementation (alongside UDP)
  • Detailed categorization with reputation/age ratings
  • Cron job for automated daily updates
  • Singleton category converter pattern
  • 🔄 Work in Progress:
    • playground.js - experimental/testing code
    • parse-detailed-category-use-case.js - new detailed parsing feature
    • Enhanced server.js - expanded server capabilities

Known Modified Files

  • playground.js - development/testing (can be cleaned up)
  • src/server.js - recent enhancements
  • makefile - new convenience commands
  • test-detailed.http - expanded test coverage

Guidelines for Contributions

  1. Follow Existing Patterns: Use use-case classes, follow module structure
  2. Test Before Committing: Use test-detailed.http for API changes
  3. Update Mappings Properly: Edit categories-mapping.json, not hardcode values
  4. Document Breaking Changes: Update this guide if architecture changes
  5. Keep CircleCI Happy: Ensure Docker build succeeds and K8s deployment configs are valid
  6. Don't Skip Steps: Always test UDP and HTTP interfaces for categorization changes
  7. Language Standards: All comments, error messages, and logs must be in English

Resources & External Documentation