Performance Tuning

Purpose

Optimization strategies for high-throughput and low-latency Rcon usage.

Overview

Rcon is designed for simplicity and correctness, but there are several strategies for optimizing performance in demanding scenarios.

Baseline Performance

Default configuration provides:

Latency: 1-2 round-trips per command (ACTIVE_PROBE strategy)
Throughput: ~100-1000 commands/second (depending on network)
Memory: ~10KB per client instance
CPU: Minimal, mostly idle waiting for I/O

Bottleneck Analysis

Typical Command Flow

Client App → RconClient → Rcon → Socket → Network → Server
    ↓           ↓          ↓        ↓         ↓       ↓
  [CPU]      [CPU]     [CPU]    [I/O]    [I/O]   [I/O]
               ↓          ↓        ↓
           [Logger]  [Sync]   [Buffer]

Bottlenecks (in order of impact): 1. Network latency (unavoidable) 2. Synchronized methods (serializes commands) 3. Logging (especially DEBUG level) 4. Buffer allocation (mitigated by double-buffering)

Optimization Strategies

Strategy 1: Connection Pooling

Problem: Single client serializes all commands via synchronized methods.

Solution: Use multiple clients to parallelize network I/O.

public class RconConnectionPool {
    private final Queue<RconClient> pool = new LinkedList<>();
    private final int maxSize;

    public RconConnectionPool(String host, int port, String password, int size) {
        this.maxSize = size;
        for (int i = 0; i < size; i++) {
            pool.add(RconClient.builder()
                .host(host)
                .port(port)
                .password(password)
                .build());
        }
    }

    public RconClient acquire() {
        synchronized (pool) {
            while (pool.isEmpty()) {
                try {
                    pool.wait();
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException(e);
                }
            }
            return pool.remove();
        }
    }

    public void release(RconClient client) {
        synchronized (pool) {
            if (pool.size() < maxSize) {
                pool.add(client);
            }
            pool.notifyAll();
        }
    }
}

Usage:

RconConnectionPool pool = new RconConnectionPool("localhost", 25575, "password", 5);

ExecutorService executor = Executors.newFixedThreadPool(10);

for (int i = 0; i < 100; i++) {
    final int cmdNum = i;
    executor.submit(() -> {
        RconClient client = pool.acquire();
        try {
            RconResponse response = client.sendCommand("say Command " + cmdNum);
        } finally {
            pool.release(client);
        }
    });
}

Throughput improvement: 5-10x (depending on pool size)

Strategy 2: Async API

Problem: Synchronous API blocks threads during network I/O.

Solution: Use sendCommandAsync() with non-blocking futures.

List<CompletableFuture<RconResponse>> futures = commands.stream()
    .map(client::sendCommandAsync)
    .toList();

CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
    .join();

Thread usage improvement: 10-100x (depends on command count)

Strategy 3: Reduce Logging

Problem: DEBUG logging adds overhead on every command.

Solution: Use INFO or WARN level in production.

<!-- Logback production config -->
<configuration>
    <logger name="io.cavarest.rcon" level="INFO"/>
</configuration>

Or disable logging entirely:

<configuration>
    <logger name="io.cavarest.rcon" level="OFF"/>
</configuration>

Latency improvement: 5-10%

Strategy 4: Optimize Fragment Resolution

Problem: TIMEOUT strategy adds 100ms latency to every command.

Solution: Use ACTIVE_PROBE (default) for better latency.

// Explicitly use ACTIVE_PROBE (default)
RconClient client = RconClient.builder()
    .host("localhost")
    .port(25575)
    .password("password")
    .fragmentStrategy(FragmentResolutionStrategy.ACTIVE_PROBE)
    .build();

Latency improvement: 100ms per command (for TIMEOUT users)

Strategy 5: Tune Timeouts

Problem: Default 10-second timeout is too long for fast-fail scenarios.

Solution: Reduce timeout for faster failure detection.

// Fast-fail configuration
RconClient client = RconClient.builder()
    .host("localhost")
    .port(25575)
    .password("password")
    .timeout(Duration.ofSeconds(2))  // 2 seconds
    .build();

Failure detection improvement: 8 seconds faster

Memory Optimization

Buffer Sizing

Default buffers are sized for typical RCON usage:

// In PacketReader
private static final int RECEIVE_BUFFER = 4096;  // 4KB

// In PacketWriter
private static final int SEND_BUFFER = 1460;     // Typical MTU

For large responses, increase receive buffer:

// Custom buffer sizes (requires PacketReader/PacketWriter modification)
private static final int RECEIVE_BUFFER = 65536;  // 64KB

Memory tradeoff: 16x more memory for potentially fewer syscalls

Network Optimization

TCP_NODELAY

Disable Nagle’s algorithm for lower latency:

Socket socket = SocketChannel.open().socket();
socket.setTcpNoDelay(true);  // Disable Nagle's algorithm

Latency improvement: 10-40ms on high-latency networks

Keep-Alive

Enable TCP keep-alive for long-lived connections:

socket.setKeepAlive(true);
socket.setSoTimeout(60000);  // 60 second timeout

Profiling

JMX Monitoring

Enable JMX for runtime monitoring:

java -Dcom.sun.management.jmxremote \
     -Dcom.sun.management.jmxremote.port=9010 \
     -Dcom.sun.management.jmxremote.authenticate=false \
     -Dcom.sun.management.jmxremote.ssl=false \
     -jar rcon.jar

Connect with JConsole or VisualVM to monitor: * Thread count * Memory usage * GC activity * CPU usage

Custom Metrics

Add metrics collection:

public class MetricsRconClient implements RconClient {
    private final RconClient delegate;
    private final AtomicLong commandCount = new AtomicLong(0);
    private final AtomicLong totalLatencyNanos = new AtomicLong(0);

    @Override
    public RconResponse sendCommand(String command) {
        long start = System.nanoTime();
        try {
            RconResponse response = delegate.sendCommand(command);
            commandCount.incrementAndGet();
            return response;
        } finally {
            totalLatencyNanos.addAndGet(System.nanoTime() - start);
        }
    }

    public double getAverageLatencyMillis() {
        long count = commandCount.get();
        return count == 0 ? 0 :
            (totalLatencyNanos.get() / count) / 1_000_000.0;
    }
}

Performance Benchmarks

Benchmark Setup

@State(Scope.Benchmark)
public class RconBenchmark {
    private RconClient client;

    @Setup
    public void setup() {
        client = RconClient.builder()
            .host("localhost")
            .port(25575)
            .password("password")
            .build();
    }

    @Benchmark
    public RconResponse sendCommand() {
        return client.sendCommand("seed");
    }

    @TearDown
    public void tearDown() {
        client.close();
    }
}

Run with JMH:

java -jar rcon.jar benchmarks/target/benchmarks.jar

Expected Results

Typical performance on local network (1ms RTT):

Metric	Single Client	Pool of 5	Pool of 10
Throughput	~500 cmd/s	~2,000 cmd/s	~3,500 cmd/s
Avg Latency	2ms	2ms	2ms
P95 Latency	5ms	10ms	15ms
P99 Latency	10ms	25ms	40ms
Memory	10MB	50MB	100MB

Metric

Single Client

Pool of 5

Pool of 10

Throughput

~500 cmd/s

~2,000 cmd/s

~3,500 cmd/s

Avg Latency

2ms

P95 Latency

5ms

10ms

15ms

P99 Latency

10ms

25ms

40ms

Memory

10MB

50MB

100MB

Production Checklist

Before deploying to production:

Use connection pooling for high-throughput scenarios
Enable async API for non-blocking operations
Set appropriate timeouts (2-5 seconds)
Use ACTIVE_PROBE fragment strategy
Set logging to INFO level
Enable metrics collection
Configure circuit breaker for fault tolerance
Test with realistic load
Monitor memory usage
Profile before optimizing