Files
netstar-categorizer/claude.md
2026-04-23 17:41:18 -03:00

291 lines
11 KiB
Markdown

# Claude Development Guide - NetStar Categorizer
## Project Overview
**NetStar Categorizer** is a Node.js microservice that provides domain name/FQDN categorization using the inCompass NetStar SDK. It exposes both UDP and HTTP interfaces for real-time content classification with automatic daily database updates.
### Core Purpose
- Categorize domains into standardized categories using NetStar's content database
- Map raw NetStar category IDs to organization-specific category codes
- Provide both simple and detailed categorization results with reputation/age ratings
- Maintain updated categorization databases through automated cron jobs
## Key Architecture
### Technology Stack
- **Runtime**: Node.js v14+
- **Web Framework**: Express.js v4
- **External SDK**: inCompass NetStar v3.1.0-2 (C++ library for categorization)
- **Containerization**: Docker + Kubernetes
- **CI/CD**: CircleCI
- **Infrastructure**: Cloud-agnostic (DigitalOcean, Google Cloud, etc.)
### Core Services
1. **HTTP Server** (Port 3333) - REST API for categorization requests
2. **UDP Server** (Port 33333) - Legacy UDP interface for categorization
3. **Cron Service** - Daily database updates via Kubernetes CronJob
4. **Category Mapping** - NetStar ID → Organization Code conversion (singleton pattern)
## Project Structure
```
src/
├── server.js # Entry point: initializes UDP + HTTP servers
├── app.js # Core logic: orchestrates categorization flow
├── client.js # UDP test client
├── cron.js # Scheduled database update logic
├── use-cases/
│ ├── get-category-use-case.js # Executes NetStar gcf1check command
│ ├── category-converter-use-case.js # Maps NetStar IDs → org codes (singleton)
│ ├── parse-detailed-category-use-case.js # Parses detailed output with ratings
│ └── update-categories-use-case.js # Updates NetStar databases
└── etc/
└── categories-mapping.json # Mapping table: NetStar ID → Zvelo codes
deployment/
├── deployment.yaml # Kubernetes deployment manifest
└── staging/deployment.yaml # Staging-specific config
.circleci/config.yml # CI/CD pipeline (builds, tests, deploys)
Dockerfile # Container build specification
package.json # Node dependencies + scripts
makefile # Convenience commands
test-detailed.http # HTTP endpoint test file
```
## Language Standards
### Comments and Error Messages - ENGLISH ONLY
**All code comments, error messages, log statements, and documentation must be written in English.**
This includes:
- ✅ Code comments explaining logic
- ✅ Error messages and exception messages
- ✅ Console.log, console.error, and logging statements
- ✅ Variable and function names
- ✅ Commit messages
- ✅ Code review feedback
- ✅ Documentation strings (JSDoc, etc.)
**Examples:**
```javascript
// Correct: English comment
function mapCategoryId(netstarId) {
if (!netstarId) {
throw new Error('NetStar ID is required')
}
// Map to organization category code
return categoryConverter.convert(netstarId)
}
// Incorrect: Portuguese comment
function mapCategoryId(netstarId) {
if (!netstarId) {
throw new Error('ID do NetStar é obrigatório') // ❌ ERROR IN PORTUGUESE
}
// Mapear para código de categoria da organização // ❌ COMMENT IN PORTUGUESE
return categoryConverter.convert(netstarId)
}
```
## Coding Conventions
### Code Style
- Use **ES6 syntax** (const/let, arrow functions, template literals)
- **No semicolons** in new code (already established pattern)
- Functional/modular design - keep files focused on single responsibility
- Use **singleton pattern** for shared state (see: `CategoryConverterUseCase`)
- **All comments in English** - see Language Standards section
### Use Case Pattern
- Each business operation gets a dedicated use case class in `src/use-cases/`
- Use case classes should have a clear, single responsibility
- Example:
```javascript
class GetCategoryUseCase {
async execute(fqdn) {
// implementation
}
}
```
### Environment Variables
- Defined in `.env` (create from `.env.example`)
- `UDP_PORT=33333` - UDP server listen port
- `HTTP_PORT=3333` - HTTP server listen port
## HTTP API Endpoints
### `POST /` - Basic Categorization
- **Input**: `{"fqdn": "example.com"}`
- **Output**: `{"result": [10009, 10010]}` (array of category IDs)
- **Use Case**: Quick lookups when detailed info not needed
### `POST /detailed` - Detailed Categorization
- **Input**: `{"fqdn": "example.com"}`
- **Output**: Full category info with reputation score, age rating, primary/secondary categories, and human-readable names
- **Use Case**: Comprehensive categorization for security decisions
- **Includes**: `result` array matching the `/` endpoint format
### `POST /full` - Complete Raw Categorization Output
- **Input**: `{"fqdn": "example.com"}`
- **Output**: Complete detailed structure with all 35 fields from NetStar including:
- Primary, secondary, and security categories (with IDs, names, and mapped IDs)
- Reputation score and name
- Matching flags and their descriptions
- Age rating score and name
- All 9 category group classifications (Internet/Infrastructure, Malware/Security, Dangerous/Harmful, Adult, Business/Government, Personal, Computing/Technology, Social Media, Miscellaneous)
- Volume index
- Submitted URL
- **Use Case**: Complete diagnostic and analysis when all categorization data is needed
- **Includes**: `result` array matching the `/` endpoint format
### UDP Server (Port 33333)
- **Input**: Raw domain string (e.g., `"example.com"`)
- **Output**: JSON-formatted result (same as HTTP `/` endpoint)
- **Legacy Interface**: Maintained for backward compatibility
## Development Workflow
### Setup & Running Locally
```bash
# Install dependencies
npm install
# Development with auto-reload
npm run dev:server # Watch mode for server changes
npm run dev:client # Run UDP client for testing
# Production
npm start # Start both servers
# NetStar service commands (Linux system)
make gcf1-start # Start NetStar service
make gcf1-download # Download category databases
make gcf1-update # Update category databases
```
### Testing APIs
Use `test-detailed.http` in VS Code REST Client extension:
1. Open the file
2. Click "Send Request" on each endpoint
3. View responses in the side panel
### Git Workflow
- **Main Branch**: `main` - production stable code
- **Development Branch**: `development` - feature integration
- **Feature Branches**: Create from `development`, merge back via PR
- Recent commits show HTTP approach implementation and cron job additions
- **Commit Messages**: Must be in English
## Deployment
### Docker
- **Base Image**: Ubuntu 22.04
- **Includes**: Boost libraries, Node.js, NetStar SDK
- **Exposes**: Port 3000 (UDP)
- Build: `docker build -t netstar-categorizer .`
### Kubernetes (Production)
- **Namespace**: `blackdice`
- **Deployment**: Single replica in appropriate cluster
- **CronJob**: Daily database updates at 00:00 UTC
- **Ingress**: `netstar-cat-dev.blackdice.ai` (DNS varies by environment)
- **Branches to Environments**:
- `development` → development cluster
- `qa` → QA cluster
- `staging` → staging cluster
- `production` → production cluster
- `gke-staging`, `gke-pov` → specific GKE clusters
### CI/CD Pipeline (CircleCI)
- Automatically builds Docker image on push
- Tags image with commit SHA
- Deploys to appropriate Kubernetes cluster based on branch
- Release deployments via git tags
## Key Technical Details
### Category Mapping System
- **Source**: NetStar SDK returns numeric category IDs (e.g., 101, 102)
- **Mapping File**: `src/etc/categories-mapping.json`
- **Target**: Maps to organization's Zvelo pattern codes (e.g., 10075, 10078)
- **Singleton Implementation**: `CategoryConverterUseCase` maintains single instance across app
- **Example Mapping**:
- NetStar 101 (Illegal Activities) → Zvelo 10075
- NetStar 201 (Terrorism/Extremists) → Zvelo 10018
### NetStar SDK Integration
- **Command**: `gcf1check` - queries the NetStar database for domain categorization
- **Child Process**: Executed via Node.js `child_process` module
- **Output Parsing**: Raw output parsed into JSON structure
- **Detailed Mode**: Includes reputation scores and age ratings in output
### Automatic Database Updates
- **Mechanism**: Kubernetes CronJob at 0 0 * * * (daily at midnight UTC)
- **Fallback**: Manual update via `make gcf1-update`
- **Purpose**: Keeps categorization database current with latest NetStar classifications
## Common Tasks
### Adding a New Endpoint
1. Create a corresponding use case in `src/use-cases/`
2. Add route in `src/app.js` that calls the use case
3. Export and test in `test-detailed.http`
4. Update this guide if it's a significant feature
5. Ensure all error messages and comments are in English
### Updating Category Mappings
1. Modify `src/etc/categories-mapping.json` with new ID mappings
2. Restart the service (singleton will reload on next request)
3. Test with both HTTP and UDP interfaces
### Debugging
- **Server Logs**: Check Docker/Kubernetes logs for errors
- **Cron Logs**: View Kubernetes CronJob logs for database update issues
- **UDP Testing**: Use `npm run dev:client` to test directly
- **HTTP Testing**: Use `test-detailed.http` with VS Code REST Client
- **Error Messages**: All error logs must be in English
### Troubleshooting
- **NetStar Service Not Running**: Run `make gcf1-start`
- **Stale Categories**: Manually run `make gcf1-update` or wait for cron job
- **Port Conflicts**: Ensure ports 3333 (HTTP) and 33333 (UDP) are available
- **Docker Build Issues**: Check that Boost C++ libraries are installed correctly
## Current Development Status
### Recent Work
- ✅ HTTP server implementation (alongside UDP)
- ✅ Detailed categorization with reputation/age ratings
- ✅ Cron job for automated daily updates
- ✅ Singleton category converter pattern
- 🔄 Work in Progress:
- `playground.js` - experimental/testing code
- `parse-detailed-category-use-case.js` - new detailed parsing feature
- Enhanced `server.js` - expanded server capabilities
### Known Modified Files
- `playground.js` - development/testing (can be cleaned up)
- `src/server.js` - recent enhancements
- `makefile` - new convenience commands
- `test-detailed.http` - expanded test coverage
## Guidelines for Contributions
1. **Follow Existing Patterns**: Use use-case classes, follow module structure
2. **Test Before Committing**: Use `test-detailed.http` for API changes
3. **Update Mappings Properly**: Edit `categories-mapping.json`, not hardcode values
4. **Document Breaking Changes**: Update this guide if architecture changes
5. **Keep CircleCI Happy**: Ensure Docker build succeeds and K8s deployment configs are valid
6. **Don't Skip Steps**: Always test UDP and HTTP interfaces for categorization changes
7. **Language Standards**: All comments, error messages, and logs must be in English
## Resources & External Documentation
- **NetStar SDK**: Installed in Docker, documentation in inCompass SDK v3.1.0-2
- **Express.js**: https://expressjs.com
- **Node.js Child Process**: https://nodejs.org/api/child_process.html
- **Kubernetes**: https://kubernetes.io/docs
- **CircleCI**: Configuration at `.circleci/config.yml`