Building a Bitcoin Wallet from Scratch: BIP-32/39/44, UTXO Management, and RGB Protocol

ACAbhishek Chauhan··12 min read
Building a Bitcoin Wallet from Scratch: BIP-32/39/44, UTXO Management, and RGB Protocol

Building a Bitcoin wallet is one of the best ways to deeply understand how Bitcoin works. Not a wrapper around an API — an actual wallet that derives keys, constructs transactions, manages UTXOs, and signs. Here's what I learned building Invebit, a cross-platform Bitcoin wallet with React Native, Chrome Extension, and Node.js backend.

HD Key Derivation: BIP-32/39/44

BIP-39: Entropy to Mnemonic to Seed

Everything starts with random entropy. BIP-39 defines a precise algorithm to convert raw randomness into human-readable words, then into a cryptographic seed:

Step 1 — Generate entropy (128-256 bits, must be a multiple of 32):

Entropy Bits Checksum Bits Total Bits Words
128 4 132 12
160 5 165 15
192 6 198 18
224 7 231 21
256 8 264 24

Step 2 — Compute checksum: Take the first ENT/32 bits of SHA-256(entropy) and append to the entropy.

Step 3 — Split into 11-bit groups: Each 11-bit segment maps to a word in a 2048-word wordlist (index 0-2047). The wordlist is sorted to enable binary search and trie compression.

Step 4 — Derive seed via PBKDF2:

PBKDF2(
  password: mnemonic_sentence (UTF-8 NFKD normalized),
  salt: "mnemonic" + passphrase (UTF-8 NFKD normalized),
  iterations: 2048,
  PRF: HMAC-SHA512,
  output: 512 bits
)

The passphrase acts as a "25th word." Same mnemonic with different passphrase = completely different wallet. Every passphrase produces a valid seed — there's no way to tell if a passphrase is "correct" without deriving keys and checking for funds. This enables plausible deniability: a coerced user can reveal a decoy passphrase that opens an empty wallet.

Critical detail: The mnemonic-to-seed conversion is independent from mnemonic generation. This means you can use any wordlist or even raw sentences — the seed derivation doesn't care. The checksum only matters for validating that a mnemonic was generated correctly.

BIP-32: Hierarchical Deterministic Derivation

From the 512-bit seed, BIP-32 derives an infinite tree of key pairs using HMAC-SHA512.

Master key generation:

I = HMAC-SHA512(key = "Bitcoin seed", data = seed)
I_L (left 256 bits) = master private key (must be valid: non-zero and less than curve order n)
I_R (right 256 bits) = master chain code

An extended key is a key + chain code pair. The chain code adds entropy to derivation — without it, anyone with a child key could derive siblings.

Child key derivation (CKD) — the core algorithm:

For normal (non-hardened) child i (where i < 2^31):

I = HMAC-SHA512(
  key: parent_chain_code,
  data: serialize_point(parent_public_key) || serialize_32(i)
)
child_private_key = parse_256(I_L) + parent_private_key (mod n)
child_chain_code = I_R

For hardened child i (where i >= 2^31, written as i'):

I = HMAC-SHA512(
  key: parent_chain_code,
  data: 0x00 || serialize_256(parent_private_key) || serialize_32(i)
)
child_private_key = parse_256(I_L) + parent_private_key (mod n)
child_chain_code = I_R

Why hardened derivation matters: Normal derivation uses the parent public key as input. This means anyone with an extended public key (xpub) can derive all non-hardened child public keys — useful for watch-only wallets. But it also means: if a single non-hardened child private key leaks alongside the parent xpub, an attacker can compute the parent private key and derive every child key.

Hardened derivation uses the parent private key as input, breaking this chain. Rule: always use hardened derivation for account-level separation.

Extended key serialization (78 bytes, Base58Check encoded):

4 bytes:  version (xpub: 0x0488B21E, xprv: 0x0488ADE4)
1 byte:   depth (0x00 for master)
4 bytes:  parent fingerprint (first 4 bytes of Hash160(parent_pubkey))
4 bytes:  child number
32 bytes: chain code
33 bytes: key data (compressed public key OR 0x00 || private key)

This produces the xpub/xprv strings (111 characters) you see in wallet software.

BIP-44: Multi-Account Hierarchy

BIP-44 standardizes the derivation path into 5 levels:

m / purpose' / coin_type' / account' / change / address_index
m / 44'      / 0'         / 0'       / 0      / 0
Level Value Hardened Purpose
purpose 44' Yes BIP-44 compliance marker
coin_type 0' (Bitcoin), 1' (testnet) Yes Prevents cross-chain address reuse
account 0', 1', ... Yes Independent user accounts
change 0 (external), 1 (internal) No Receiving vs change addresses
address_index 0, 1, ... No Sequential address generation

Account discovery algorithm: When importing a seed, scan account 0 first. Check its external chain addresses up to the gap limit (20 consecutive unused addresses). If account 0 has transactions, increment to account 1 and repeat. Stop when an account has no transaction history.

The gap limit of 20 means: never show a user more than 20 unused receiving addresses at once. If they skip addresses, the wallet import might not find their funds.

Key insight: Always generate a new address for each transaction. Address reuse hurts privacy (observers can link your transactions) and complicates UTXO management.

UTXO Management

This is where most wallet tutorials fall short. Bitcoin doesn't have "balances" — it has Unspent Transaction Outputs (UTXOs). When your wallet shows "0.5 BTC", it's actually summing up discrete UTXOs you control.

How UTXOs Work

Every Bitcoin transaction consumes inputs (references to previous UTXOs) and creates outputs (new UTXOs). Each output has:

An input provides:

UTXOs are atomic — you can't spend half a UTXO. If you have a 1 BTC UTXO and want to send 0.3 BTC, the transaction consumes the entire 1 BTC UTXO and creates two outputs: 0.3 BTC to the recipient and ~0.6999 BTC back to yourself as change (minus fee).

Output Types We Supported

Type Script Pattern vBytes per Input Use Case
P2PKH OP_DUP OP_HASH160 <hash> OP_EQUALVERIFY OP_CHECKSIG ~148 Legacy — widest compatibility
P2SH-P2WPKH OP_HASH160 <hash> OP_EQUAL (wrapping SegWit) ~91 Nested SegWit — backward compatible
P2WPKH OP_0 <20-byte-hash> ~68 Native SegWit — cheapest, modern
P2TR OP_1 <32-byte-key> ~57.5 Taproot — cheapest, latest

We defaulted to P2WPKH (native SegWit) — ~54% cheaper than P2PKH and supported by all modern wallets. P2TR (Taproot) is even cheaper but had limited ecosystem support when we launched.

Coin Selection

When constructing a transaction, you need to select which UTXOs to spend. This is a constrained optimization problem: minimize fees while avoiding dust creation.

The three algorithms I evaluated:

1. Largest-first (what we shipped):

function selectUtxos(
  utxos: UTXO[],
  targetAmount: number,
  feeRate: number // sat/vByte
): { selected: UTXO[]; fee: number; change: number } {
  const sorted = [...utxos].sort((a, b) => b.value - a.value);
  let selected: UTXO[] = [];
  let total = 0;
 
  for (const utxo of sorted) {
    selected.push(utxo);
    total += utxo.value;
 
    const fee = estimateFee(selected.length, 2, feeRate);
    if (total >= targetAmount + fee) {
      const change = total - targetAmount - fee;
      if (change < DUST_LIMIT) {
        // Donate dust to miner — cheaper than creating a change output
        return { selected, fee: fee + change, change: 0 };
      }
      return { selected, fee, change };
    }
  }
  throw new Error("Insufficient funds");
}

Pros: Few inputs (lower fees), simple. Cons: Creates small change UTXOs that accumulate over time.

2. Branch and bound (Bitcoin Core's approach): Searches for a combination that exactly matches target + fee, eliminating the change output entirely. Falls back to random selection if no exact match exists. Better long-term UTXO health, but computationally expensive for wallets with thousands of UTXOs.

3. Random selection: Pick UTXOs randomly until the target is met. Better privacy (no predictable spending pattern), but unpredictable fees and change sizes.

We shipped largest-first + periodic dust consolidation — a background job that combines small UTXOs into larger ones during low-fee periods (weekends, typically 1-5 sat/vByte).

Fee Estimation

Fee estimation is critical for UX. Too low = stuck transaction (can be unconfirmed for days). Too high = wasted money.

function estimateFee(
  numInputs: number,
  numOutputs: number,
  feeRate: number // sat/vByte
): number {
  // P2WPKH transaction weight calculation
  const overhead = 10.5; // version (4) + marker/flag (0.5) + locktime (4) + input count (1) + output count (1)
  const inputWeight = numInputs * 68;   // P2WPKH: ~68 vBytes per input
  const outputWeight = numOutputs * 31; // P2WPKH: ~31 vBytes per output
  const vSize = overhead + inputWeight + outputWeight;
  return Math.ceil(vSize * feeRate);
}

Fee sources: We pull estimates from multiple providers and take the median:

Always show the user an estimated fee before signing. For Invebit, we offered three tiers: "Fast" (next block, ~10 min), "Normal" (within 3 blocks, ~30 min), "Economy" (within 6 blocks, ~1 hour).

Dust Limit

A UTXO is "dust" if it costs more to spend than it's worth. The current dust threshold is 546 satoshis for P2PKH/P2SH outputs and 294 satoshis for P2WPKH. Creating dust outputs wastes block space and burdens the UTXO set.

In our coin selection, if the change would be below the dust limit, we add it to the miner fee instead of creating a tiny UTXO that's uneconomical to spend later.

RGB Protocol: Assets on Bitcoin

RGB is a protocol for issuing and transferring assets on Bitcoin using client-side validation. Unlike Ethereum's ERC-20 tokens where all state lives on-chain, RGB stores state off-chain and commits only cryptographic proofs to Bitcoin transactions.

Three Core Concepts

1. Client-Side Validation: Parties validate only their relevant transaction history, not the entire global state. When Bob receives RGB tokens, he validates the chain of ownership from genesis to his transfer — he doesn't need to validate every transfer that ever happened in the contract.

2. Single-Use Seals: A cryptographic primitive that proves a message was published exactly once. In RGB, Bitcoin UTXOs serve as seals — spending a UTXO "closes" the seal and anchors the state change to Bitcoin's blockchain. This prevents double-spending without requiring a global ledger.

3. Deterministic Bitcoin Commitments (DBC): RGB data is committed to Bitcoin transactions via two schemes:

How Transfers Work

Alice wants to send 200 RGB tokens to Bob:

  1. Alice creates a state transition: New state assigns 200 tokens to Bob's UTXO (concealed — the UTXO is blinded so observers can't link it)
  2. Alice builds a witness transaction: Spends her UTXO (closing the single-use seal), embedding the RGB commitment via Opret or Tapret
  3. Alice creates a consignment: A package containing the full history from genesis to Bob's transfer — every state transition in the chain
  4. Alice sends the consignment to Bob (off-chain, directly)
  5. Bob validates: Verifies every transition in the consignment against Bitcoin's blockchain, confirming seals were closed correctly
  6. Bob's wallet stores the validated state: He now owns the tokens, anchored to his UTXO
Genesis (1000 tokens → Alice's UTXO_A)
  │
  ├── Transition 1: 200 tokens → Bob's UTXO_B (committed via Tapret in tx spending UTXO_A)
  │   └── Bob validates: genesis → transition 1 (checks UTXO_A was spent, commitment is valid)
  │
  └── Transition 2: 800 tokens → Alice's UTXO_C (change, same transaction)

Key difference from ERC-20: On Ethereum, the token contract maintains a global balanceOf mapping visible to everyone. In RGB, there's no on-chain state — an observer sees a normal Bitcoin transaction with no indication that an asset transfer occurred. The privacy comes from the fact that contract data exists only between the parties involved.

Multi-Protocol Commitments (MPC)

Multiple RGB contracts can batch their state transitions into a single Merkle tree, committed in one Bitcoin transaction. This means one transaction can carry state changes for hundreds of different contracts — massive scalability compared to one-contract-per-transaction models.

Why RGB Over Alternatives

Omni Layer Liquid RGB
State storage On-chain (OP_RETURN) Sidechain Client-side
Consensus Bitcoin miners Federation (11/15 multisig) Individual validation
Privacy Transparent Confidential transactions Private by default
Scalability Limited by block space Sidechain throughput Unbounded (off-chain)
Trust model Bitcoin security Federation trust Bitcoin security + individual validation
Smart contracts Limited Simplicity (limited) AluVM (Turing-equivalent)

RGB's smart contracts run on AluVM — a register-based virtual machine that's Turing-equivalent but designed for formal verification. Simple contracts use Rust macros; complex ones use AluAssembly (with a higher-level language called Contractum in development).

Security Architecture

Building a wallet means handling private keys. The attack surface includes the device, the network, the supply chain, and the user.

Key Storage

Platform Storage Encryption Biometric Unlock
iOS Keychain (Secure Enclave) Hardware-backed AES-256 Face ID / Touch ID
Android Keystore (TEE/StrongBox) Hardware-backed Fingerprint / Face
Chrome Extension chrome.storage.local AES-256-GCM (app-level) N/A

Never store keys in plaintext, localStorage, or SharedPreferences. For the Chrome Extension, we implemented app-level encryption since Chrome's storage API doesn't provide hardware-backed encryption.

Signing Isolation

Transaction signing happens in an isolated context:

  1. Load private key from secure storage
  2. Deserialize the PSBT (Partially Signed Bitcoin Transaction)
  3. Sign all inputs
  4. Wipe the private key from memory immediately
  5. Return the signed transaction

We used PSBT (BIP-174) throughout — it separates transaction construction from signing, enabling hardware wallet support and multi-sig flows.

Supply Chain Security

Every dependency is a potential attack vector. Our approach:

The 2024 xz backdoor and multiple npm supply chain attacks validated this paranoia.

Transaction Malleability

Before SegWit, transaction signatures weren't covered by the txid hash — an attacker could modify the signature script without invalidating it, changing the txid. This breaks any system that tracks unconfirmed transactions by txid.

Mitigation: We use SegWit (P2WPKH) exclusively, which moves signature data to the witness and computes txid without it. For any remaining edge cases, we track transactions by the UTXOs they spend (inputs), not by txid.

Address Verification

A clipboard hijacker changing one character in a Bitcoin address can drain funds. Our approach:

Testing Strategy

Bitcoin transactions are irreversible. Our testing pipeline:

  1. Unit tests: Key derivation against BIP-32/39/44 test vectors (from the specs)
  2. Integration tests on testnet: 500+ automated transaction tests — send, receive, UTXO consolidation, fee estimation accuracy, edge cases (dust, max UTXO count)
  3. Regtest environment: Private Bitcoin network for deterministic testing — we control block production
  4. Fuzzing: Random inputs to transaction construction, PSBT parsing, and address validation
  5. Security audit: Two independent audits before mainnet launch, third audit 6 months after

Key Takeaways

  1. UTXOs are the hard part. Key derivation is well-documented with test vectors. UTXO management — coin selection, consolidation, fee estimation, dust handling, change output strategy — is where wallet quality differs.

  2. Hardened derivation isn't optional. The xpub + leaked child private key = full compromise attack is real. Use hardened derivation for account separation, always.

  3. Fee estimation is a product decision. The algorithm is simple; the UX is hard. Users don't understand sat/vByte — they understand "fast", "normal", "economy."

  4. RGB is production-ready for issuance, early for trading. The protocol is sound and the privacy properties are exceptional. But the tooling ecosystem (wallets, explorers, DEXes) is still maturing compared to EVM.

  5. Security is never "done." We did two independent security audits and still found issues in the third review. Budget for continuous security work, especially around dependency updates.

  6. Test on testnet obsessively. Mainnet bugs cost real money. We required 100% testnet pass rate for 48 hours before any mainnet deployment.

Related Posts


Need a Bitcoin wallet, RGB integration, or custom blockchain infrastructure? I've built it end-to-end. Reach out or book a call.