291 lines
11 KiB
Markdown
291 lines
11 KiB
Markdown
# Claude Development Guide - NetStar Categorizer
|
|
|
|
## Project Overview
|
|
|
|
**NetStar Categorizer** is a Node.js microservice that provides domain name/FQDN categorization using the inCompass NetStar SDK. It exposes both UDP and HTTP interfaces for real-time content classification with automatic daily database updates.
|
|
|
|
### Core Purpose
|
|
- Categorize domains into standardized categories using NetStar's content database
|
|
- Map raw NetStar category IDs to organization-specific category codes
|
|
- Provide both simple and detailed categorization results with reputation/age ratings
|
|
- Maintain updated categorization databases through automated cron jobs
|
|
|
|
## Key Architecture
|
|
|
|
### Technology Stack
|
|
- **Runtime**: Node.js v14+
|
|
- **Web Framework**: Express.js v4
|
|
- **External SDK**: inCompass NetStar v3.1.0-2 (C++ library for categorization)
|
|
- **Containerization**: Docker + Kubernetes
|
|
- **CI/CD**: CircleCI
|
|
- **Infrastructure**: Cloud-agnostic (DigitalOcean, Google Cloud, etc.)
|
|
|
|
### Core Services
|
|
1. **HTTP Server** (Port 3333) - REST API for categorization requests
|
|
2. **UDP Server** (Port 33333) - Legacy UDP interface for categorization
|
|
3. **Cron Service** - Daily database updates via Kubernetes CronJob
|
|
4. **Category Mapping** - NetStar ID → Organization Code conversion (singleton pattern)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
src/
|
|
├── server.js # Entry point: initializes UDP + HTTP servers
|
|
├── app.js # Core logic: orchestrates categorization flow
|
|
├── client.js # UDP test client
|
|
├── cron.js # Scheduled database update logic
|
|
├── use-cases/
|
|
│ ├── get-category-use-case.js # Executes NetStar gcf1check command
|
|
│ ├── category-converter-use-case.js # Maps NetStar IDs → org codes (singleton)
|
|
│ ├── parse-detailed-category-use-case.js # Parses detailed output with ratings
|
|
│ └── update-categories-use-case.js # Updates NetStar databases
|
|
└── etc/
|
|
└── categories-mapping.json # Mapping table: NetStar ID → Zvelo codes
|
|
|
|
deployment/
|
|
├── deployment.yaml # Kubernetes deployment manifest
|
|
└── staging/deployment.yaml # Staging-specific config
|
|
|
|
.circleci/config.yml # CI/CD pipeline (builds, tests, deploys)
|
|
Dockerfile # Container build specification
|
|
package.json # Node dependencies + scripts
|
|
makefile # Convenience commands
|
|
test-detailed.http # HTTP endpoint test file
|
|
```
|
|
|
|
## Language Standards
|
|
|
|
### Comments and Error Messages - ENGLISH ONLY
|
|
**All code comments, error messages, log statements, and documentation must be written in English.**
|
|
|
|
This includes:
|
|
- ✅ Code comments explaining logic
|
|
- ✅ Error messages and exception messages
|
|
- ✅ Console.log, console.error, and logging statements
|
|
- ✅ Variable and function names
|
|
- ✅ Commit messages
|
|
- ✅ Code review feedback
|
|
- ✅ Documentation strings (JSDoc, etc.)
|
|
|
|
**Examples:**
|
|
```javascript
|
|
// Correct: English comment
|
|
function mapCategoryId(netstarId) {
|
|
if (!netstarId) {
|
|
throw new Error('NetStar ID is required')
|
|
}
|
|
// Map to organization category code
|
|
return categoryConverter.convert(netstarId)
|
|
}
|
|
|
|
// Incorrect: Portuguese comment
|
|
function mapCategoryId(netstarId) {
|
|
if (!netstarId) {
|
|
throw new Error('ID do NetStar é obrigatório') // ❌ ERROR IN PORTUGUESE
|
|
}
|
|
// Mapear para código de categoria da organização // ❌ COMMENT IN PORTUGUESE
|
|
return categoryConverter.convert(netstarId)
|
|
}
|
|
```
|
|
|
|
## Coding Conventions
|
|
|
|
### Code Style
|
|
- Use **ES6 syntax** (const/let, arrow functions, template literals)
|
|
- **No semicolons** in new code (already established pattern)
|
|
- Functional/modular design - keep files focused on single responsibility
|
|
- Use **singleton pattern** for shared state (see: `CategoryConverterUseCase`)
|
|
- **All comments in English** - see Language Standards section
|
|
|
|
### Use Case Pattern
|
|
- Each business operation gets a dedicated use case class in `src/use-cases/`
|
|
- Use case classes should have a clear, single responsibility
|
|
- Example:
|
|
```javascript
|
|
class GetCategoryUseCase {
|
|
async execute(fqdn) {
|
|
// implementation
|
|
}
|
|
}
|
|
```
|
|
|
|
### Environment Variables
|
|
- Defined in `.env` (create from `.env.example`)
|
|
- `UDP_PORT=33333` - UDP server listen port
|
|
- `HTTP_PORT=3333` - HTTP server listen port
|
|
|
|
## HTTP API Endpoints
|
|
|
|
### `POST /` - Basic Categorization
|
|
- **Input**: `{"fqdn": "example.com"}`
|
|
- **Output**: `{"result": [10009, 10010]}` (array of category IDs)
|
|
- **Use Case**: Quick lookups when detailed info not needed
|
|
|
|
### `POST /detailed` - Detailed Categorization
|
|
- **Input**: `{"fqdn": "example.com"}`
|
|
- **Output**: Full category info with reputation score, age rating, primary/secondary categories, and human-readable names
|
|
- **Use Case**: Comprehensive categorization for security decisions
|
|
- **Includes**: `result` array matching the `/` endpoint format
|
|
|
|
### `POST /full` - Complete Raw Categorization Output
|
|
- **Input**: `{"fqdn": "example.com"}`
|
|
- **Output**: Complete detailed structure with all 35 fields from NetStar including:
|
|
- Primary, secondary, and security categories (with IDs, names, and mapped IDs)
|
|
- Reputation score and name
|
|
- Matching flags and their descriptions
|
|
- Age rating score and name
|
|
- All 9 category group classifications (Internet/Infrastructure, Malware/Security, Dangerous/Harmful, Adult, Business/Government, Personal, Computing/Technology, Social Media, Miscellaneous)
|
|
- Volume index
|
|
- Submitted URL
|
|
- **Use Case**: Complete diagnostic and analysis when all categorization data is needed
|
|
- **Includes**: `result` array matching the `/` endpoint format
|
|
|
|
### UDP Server (Port 33333)
|
|
- **Input**: Raw domain string (e.g., `"example.com"`)
|
|
- **Output**: JSON-formatted result (same as HTTP `/` endpoint)
|
|
- **Legacy Interface**: Maintained for backward compatibility
|
|
|
|
## Development Workflow
|
|
|
|
### Setup & Running Locally
|
|
```bash
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Development with auto-reload
|
|
npm run dev:server # Watch mode for server changes
|
|
npm run dev:client # Run UDP client for testing
|
|
|
|
# Production
|
|
npm start # Start both servers
|
|
|
|
# NetStar service commands (Linux system)
|
|
make gcf1-start # Start NetStar service
|
|
make gcf1-download # Download category databases
|
|
make gcf1-update # Update category databases
|
|
```
|
|
|
|
### Testing APIs
|
|
Use `test-detailed.http` in VS Code REST Client extension:
|
|
1. Open the file
|
|
2. Click "Send Request" on each endpoint
|
|
3. View responses in the side panel
|
|
|
|
### Git Workflow
|
|
- **Main Branch**: `main` - production stable code
|
|
- **Development Branch**: `development` - feature integration
|
|
- **Feature Branches**: Create from `development`, merge back via PR
|
|
- Recent commits show HTTP approach implementation and cron job additions
|
|
- **Commit Messages**: Must be in English
|
|
|
|
## Deployment
|
|
|
|
### Docker
|
|
- **Base Image**: Ubuntu 22.04
|
|
- **Includes**: Boost libraries, Node.js, NetStar SDK
|
|
- **Exposes**: Port 3000 (UDP)
|
|
- Build: `docker build -t netstar-categorizer .`
|
|
|
|
### Kubernetes (Production)
|
|
- **Namespace**: `blackdice`
|
|
- **Deployment**: Single replica in appropriate cluster
|
|
- **CronJob**: Daily database updates at 00:00 UTC
|
|
- **Ingress**: `netstar-cat-dev.blackdice.ai` (DNS varies by environment)
|
|
- **Branches to Environments**:
|
|
- `development` → development cluster
|
|
- `qa` → QA cluster
|
|
- `staging` → staging cluster
|
|
- `production` → production cluster
|
|
- `gke-staging`, `gke-pov` → specific GKE clusters
|
|
|
|
### CI/CD Pipeline (CircleCI)
|
|
- Automatically builds Docker image on push
|
|
- Tags image with commit SHA
|
|
- Deploys to appropriate Kubernetes cluster based on branch
|
|
- Release deployments via git tags
|
|
|
|
## Key Technical Details
|
|
|
|
### Category Mapping System
|
|
- **Source**: NetStar SDK returns numeric category IDs (e.g., 101, 102)
|
|
- **Mapping File**: `src/etc/categories-mapping.json`
|
|
- **Target**: Maps to organization's Zvelo pattern codes (e.g., 10075, 10078)
|
|
- **Singleton Implementation**: `CategoryConverterUseCase` maintains single instance across app
|
|
- **Example Mapping**:
|
|
- NetStar 101 (Illegal Activities) → Zvelo 10075
|
|
- NetStar 201 (Terrorism/Extremists) → Zvelo 10018
|
|
|
|
### NetStar SDK Integration
|
|
- **Command**: `gcf1check` - queries the NetStar database for domain categorization
|
|
- **Child Process**: Executed via Node.js `child_process` module
|
|
- **Output Parsing**: Raw output parsed into JSON structure
|
|
- **Detailed Mode**: Includes reputation scores and age ratings in output
|
|
|
|
### Automatic Database Updates
|
|
- **Mechanism**: Kubernetes CronJob at 0 0 * * * (daily at midnight UTC)
|
|
- **Fallback**: Manual update via `make gcf1-update`
|
|
- **Purpose**: Keeps categorization database current with latest NetStar classifications
|
|
|
|
## Common Tasks
|
|
|
|
### Adding a New Endpoint
|
|
1. Create a corresponding use case in `src/use-cases/`
|
|
2. Add route in `src/app.js` that calls the use case
|
|
3. Export and test in `test-detailed.http`
|
|
4. Update this guide if it's a significant feature
|
|
5. Ensure all error messages and comments are in English
|
|
|
|
### Updating Category Mappings
|
|
1. Modify `src/etc/categories-mapping.json` with new ID mappings
|
|
2. Restart the service (singleton will reload on next request)
|
|
3. Test with both HTTP and UDP interfaces
|
|
|
|
### Debugging
|
|
- **Server Logs**: Check Docker/Kubernetes logs for errors
|
|
- **Cron Logs**: View Kubernetes CronJob logs for database update issues
|
|
- **UDP Testing**: Use `npm run dev:client` to test directly
|
|
- **HTTP Testing**: Use `test-detailed.http` with VS Code REST Client
|
|
- **Error Messages**: All error logs must be in English
|
|
|
|
### Troubleshooting
|
|
- **NetStar Service Not Running**: Run `make gcf1-start`
|
|
- **Stale Categories**: Manually run `make gcf1-update` or wait for cron job
|
|
- **Port Conflicts**: Ensure ports 3333 (HTTP) and 33333 (UDP) are available
|
|
- **Docker Build Issues**: Check that Boost C++ libraries are installed correctly
|
|
|
|
## Current Development Status
|
|
|
|
### Recent Work
|
|
- ✅ HTTP server implementation (alongside UDP)
|
|
- ✅ Detailed categorization with reputation/age ratings
|
|
- ✅ Cron job for automated daily updates
|
|
- ✅ Singleton category converter pattern
|
|
- 🔄 Work in Progress:
|
|
- `playground.js` - experimental/testing code
|
|
- `parse-detailed-category-use-case.js` - new detailed parsing feature
|
|
- Enhanced `server.js` - expanded server capabilities
|
|
|
|
### Known Modified Files
|
|
- `playground.js` - development/testing (can be cleaned up)
|
|
- `src/server.js` - recent enhancements
|
|
- `makefile` - new convenience commands
|
|
- `test-detailed.http` - expanded test coverage
|
|
|
|
## Guidelines for Contributions
|
|
|
|
1. **Follow Existing Patterns**: Use use-case classes, follow module structure
|
|
2. **Test Before Committing**: Use `test-detailed.http` for API changes
|
|
3. **Update Mappings Properly**: Edit `categories-mapping.json`, not hardcode values
|
|
4. **Document Breaking Changes**: Update this guide if architecture changes
|
|
5. **Keep CircleCI Happy**: Ensure Docker build succeeds and K8s deployment configs are valid
|
|
6. **Don't Skip Steps**: Always test UDP and HTTP interfaces for categorization changes
|
|
7. **Language Standards**: All comments, error messages, and logs must be in English
|
|
|
|
## Resources & External Documentation
|
|
|
|
- **NetStar SDK**: Installed in Docker, documentation in inCompass SDK v3.1.0-2
|
|
- **Express.js**: https://expressjs.com
|
|
- **Node.js Child Process**: https://nodejs.org/api/child_process.html
|
|
- **Kubernetes**: https://kubernetes.io/docs
|
|
- **CircleCI**: Configuration at `.circleci/config.yml`
|