Building a Bitcoin Wallet from Scratch: BIP-32/39/44, UTXO Management, and RGB Protocol

Building a Bitcoin wallet is one of the best ways to deeply understand how Bitcoin works. Not a wrapper around an API — an actual wallet that derives keys, constructs transactions, manages UTXOs, and signs. Here's what I learned building Invebit, a cross-platform Bitcoin wallet with React Native, Chrome Extension, and Node.js backend.

HD Key Derivation: BIP-32/39/44

BIP-39: Entropy to Mnemonic to Seed

Everything starts with random entropy. BIP-39 defines a precise algorithm to convert raw randomness into human-readable words, then into a cryptographic seed:

Step 1 — Generate entropy (128-256 bits, must be a multiple of 32):

Entropy Bits	Checksum Bits	Total Bits	Words
128	4	132	12
160	5	165	15
192	6	198	18
224	7	231	21
256	8	264	24

Step 2 — Compute checksum: Take the first ENT/32 bits of SHA-256(entropy) and append to the entropy.

Step 3 — Split into 11-bit groups: Each 11-bit segment maps to a word in a 2048-word wordlist (index 0-2047). The wordlist is sorted to enable binary search and trie compression.

Step 4 — Derive seed via PBKDF2:

PBKDF2(
  password: mnemonic_sentence (UTF-8 NFKD normalized),
  salt: "mnemonic" + passphrase (UTF-8 NFKD normalized),
  iterations: 2048,
  PRF: HMAC-SHA512,
  output: 512 bits
)

The passphrase acts as a "25th word." Same mnemonic with different passphrase = completely different wallet. Every passphrase produces a valid seed — there's no way to tell if a passphrase is "correct" without deriving keys and checking for funds. This enables plausible deniability: a coerced user can reveal a decoy passphrase that opens an empty wallet.

Critical detail: The mnemonic-to-seed conversion is independent from mnemonic generation. This means you can use any wordlist or even raw sentences — the seed derivation doesn't care. The checksum only matters for validating that a mnemonic was generated correctly.

BIP-32: Hierarchical Deterministic Derivation

From the 512-bit seed, BIP-32 derives an infinite tree of key pairs using HMAC-SHA512.

Master key generation:

I = HMAC-SHA512(key = "Bitcoin seed", data = seed)
I_L (left 256 bits) = master private key (must be valid: non-zero and less than curve order n)
I_R (right 256 bits) = master chain code

An extended key is a key + chain code pair. The chain code adds entropy to derivation — without it, anyone with a child key could derive siblings.

Child key derivation (CKD) — the core algorithm:

For normal (non-hardened) child i (where i < 2^31):

I = HMAC-SHA512(
  key: parent_chain_code,
  data: serialize_point(parent_public_key) || serialize_32(i)
)
child_private_key = parse_256(I_L) + parent_private_key (mod n)
child_chain_code = I_R

For hardened child i (where i >= 2^31, written as i'):

I = HMAC-SHA512(
  key: parent_chain_code,
  data: 0x00 || serialize_256(parent_private_key) || serialize_32(i)
)
child_private_key = parse_256(I_L) + parent_private_key (mod n)
child_chain_code = I_R

Why hardened derivation matters: Normal derivation uses the parent public key as input. This means anyone with an extended public key (xpub) can derive all non-hardened child public keys — useful for watch-only wallets. But it also means: if a single non-hardened child private key leaks alongside the parent xpub, an attacker can compute the parent private key and derive every child key.

Hardened derivation uses the parent private key as input, breaking this chain. Rule: always use hardened derivation for account-level separation.

Extended key serialization (78 bytes, Base58Check encoded):

4 bytes:  version (xpub: 0x0488B21E, xprv: 0x0488ADE4)
1 byte:   depth (0x00 for master)
4 bytes:  parent fingerprint (first 4 bytes of Hash160(parent_pubkey))
4 bytes:  child number
32 bytes: chain code
33 bytes: key data (compressed public key OR 0x00 || private key)

This produces the xpub/xprv strings (111 characters) you see in wallet software.

BIP-44: Multi-Account Hierarchy

BIP-44 standardizes the derivation path into 5 levels:

m / purpose' / coin_type' / account' / change / address_index
m / 44'      / 0'         / 0'       / 0      / 0

Level	Value	Hardened	Purpose
`purpose`	`44'`	Yes	BIP-44 compliance marker
`coin_type`	`0'` (Bitcoin), `1'` (testnet)	Yes	Prevents cross-chain address reuse
`account`	`0'`, `1'`, ...	Yes	Independent user accounts
`change`	`0` (external), `1` (internal)	No	Receiving vs change addresses
`address_index`	`0`, `1`, ...	No	Sequential address generation

Account discovery algorithm: When importing a seed, scan account 0 first. Check its external chain addresses up to the gap limit (20 consecutive unused addresses). If account 0 has transactions, increment to account 1 and repeat. Stop when an account has no transaction history.

The gap limit of 20 means: never show a user more than 20 unused receiving addresses at once. If they skip addresses, the wallet import might not find their funds.

Key insight: Always generate a new address for each transaction. Address reuse hurts privacy (observers can link your transactions) and complicates UTXO management.

UTXO Management

This is where most wallet tutorials fall short. Bitcoin doesn't have "balances" — it has Unspent Transaction Outputs (UTXOs). When your wallet shows "0.5 BTC", it's actually summing up discrete UTXOs you control.

How UTXOs Work

Every Bitcoin transaction consumes inputs (references to previous UTXOs) and creates outputs (new UTXOs). Each output has:

An amount in satoshis
A locking script (scriptPubKey) that defines spending conditions

An input provides:

A reference to a previous output (txid + vout index)
An unlocking script (scriptSig/witness) proving authorization to spend

UTXOs are atomic — you can't spend half a UTXO. If you have a 1 BTC UTXO and want to send 0.3 BTC, the transaction consumes the entire 1 BTC UTXO and creates two outputs: 0.3 BTC to the recipient and ~0.6999 BTC back to yourself as change (minus fee).

Output Types We Supported

Type	Script Pattern	vBytes per Input	Use Case
P2PKH	`OP_DUP OP_HASH160 <hash> OP_EQUALVERIFY OP_CHECKSIG`	~148	Legacy — widest compatibility
P2SH-P2WPKH	`OP_HASH160 <hash> OP_EQUAL` (wrapping SegWit)	~91	Nested SegWit — backward compatible
P2WPKH	`OP_0 <20-byte-hash>`	~68	Native SegWit — cheapest, modern
P2TR	`OP_1 <32-byte-key>`	~57.5	Taproot — cheapest, latest

We defaulted to P2WPKH (native SegWit) — ~54% cheaper than P2PKH and supported by all modern wallets. P2TR (Taproot) is even cheaper but had limited ecosystem support when we launched.

Coin Selection

When constructing a transaction, you need to select which UTXOs to spend. This is a constrained optimization problem: minimize fees while avoiding dust creation.

The three algorithms I evaluated:

1. Largest-first (what we shipped):

function selectUtxos(
  utxos: UTXO[],
  targetAmount: number,
  feeRate: number // sat/vByte
): { selected: UTXO[]; fee: number; change: number } {
  const sorted = [...utxos].sort((a, b) => b.value - a.value);
  let selected: UTXO[] = [];
  let total = 0;
 
  for (const utxo of sorted) {
    selected.push(utxo);
    total += utxo.value;
 
    const fee = estimateFee(selected.length, 2, feeRate);
    if (total >= targetAmount + fee) {
      const change = total - targetAmount - fee;
      if (change < DUST_LIMIT) {
        // Donate dust to miner — cheaper than creating a change output
        return { selected, fee: fee + change, change: 0 };
      }
      return { selected, fee, change };
    }
  }
  throw new Error("Insufficient funds");
}

Pros: Few inputs (lower fees), simple. Cons: Creates small change UTXOs that accumulate over time.

2. Branch and bound (Bitcoin Core's approach): Searches for a combination that exactly matches target + fee, eliminating the change output entirely. Falls back to random selection if no exact match exists. Better long-term UTXO health, but computationally expensive for wallets with thousands of UTXOs.

3. Random selection: Pick UTXOs randomly until the target is met. Better privacy (no predictable spending pattern), but unpredictable fees and change sizes.

We shipped largest-first + periodic dust consolidation — a background job that combines small UTXOs into larger ones during low-fee periods (weekends, typically 1-5 sat/vByte).

Fee Estimation

Fee estimation is critical for UX. Too low = stuck transaction (can be unconfirmed for days). Too high = wasted money.

function estimateFee(
  numInputs: number,
  numOutputs: number,
  feeRate: number // sat/vByte
): number {
  // P2WPKH transaction weight calculation
  const overhead = 10.5; // version (4) + marker/flag (0.5) + locktime (4) + input count (1) + output count (1)
  const inputWeight = numInputs * 68;   // P2WPKH: ~68 vBytes per input
  const outputWeight = numOutputs * 31; // P2WPKH: ~31 vBytes per output
  const vSize = overhead + inputWeight + outputWeight;
  return Math.ceil(vSize * feeRate);
}

Fee sources: We pull estimates from multiple providers and take the median:

mempool.space API: Real-time mempool-based estimates (fastest, economy, minimum)
Bitcoin Core estimatesmartfee: Historical block-based estimates
Blockstream/Esplora API: Backup source

Always show the user an estimated fee before signing. For Invebit, we offered three tiers: "Fast" (next block, ~10 min), "Normal" (within 3 blocks, ~30 min), "Economy" (within 6 blocks, ~1 hour).

Dust Limit

A UTXO is "dust" if it costs more to spend than it's worth. The current dust threshold is 546 satoshis for P2PKH/P2SH outputs and 294 satoshis for P2WPKH. Creating dust outputs wastes block space and burdens the UTXO set.

In our coin selection, if the change would be below the dust limit, we add it to the miner fee instead of creating a tiny UTXO that's uneconomical to spend later.

RGB Protocol: Assets on Bitcoin

RGB is a protocol for issuing and transferring assets on Bitcoin using client-side validation. Unlike Ethereum's ERC-20 tokens where all state lives on-chain, RGB stores state off-chain and commits only cryptographic proofs to Bitcoin transactions.

Three Core Concepts

1. Client-Side Validation: Parties validate only their relevant transaction history, not the entire global state. When Bob receives RGB tokens, he validates the chain of ownership from genesis to his transfer — he doesn't need to validate every transfer that ever happened in the contract.

2. Single-Use Seals: A cryptographic primitive that proves a message was published exactly once. In RGB, Bitcoin UTXOs serve as seals — spending a UTXO "closes" the seal and anchors the state change to Bitcoin's blockchain. This prevents double-spending without requiring a global ledger.

3. Deterministic Bitcoin Commitments (DBC): RGB data is committed to Bitcoin transactions via two schemes:

Opret: 34-byte commitment in an OP_RETURN output — simple but visible
Tapret: 64-byte commitment hidden in a taproot script path — invisible to chain analysis, no additional blockchain footprint

How Transfers Work

Alice wants to send 200 RGB tokens to Bob:

Alice creates a state transition: New state assigns 200 tokens to Bob's UTXO (concealed — the UTXO is blinded so observers can't link it)
Alice builds a witness transaction: Spends her UTXO (closing the single-use seal), embedding the RGB commitment via Opret or Tapret
Alice creates a consignment: A package containing the full history from genesis to Bob's transfer — every state transition in the chain
Alice sends the consignment to Bob (off-chain, directly)
Bob validates: Verifies every transition in the consignment against Bitcoin's blockchain, confirming seals were closed correctly
Bob's wallet stores the validated state: He now owns the tokens, anchored to his UTXO

Genesis (1000 tokens → Alice's UTXO_A)
  │
  ├── Transition 1: 200 tokens → Bob's UTXO_B (committed via Tapret in tx spending UTXO_A)
  │   └── Bob validates: genesis → transition 1 (checks UTXO_A was spent, commitment is valid)
  │
  └── Transition 2: 800 tokens → Alice's UTXO_C (change, same transaction)

Key difference from ERC-20: On Ethereum, the token contract maintains a global balanceOf mapping visible to everyone. In RGB, there's no on-chain state — an observer sees a normal Bitcoin transaction with no indication that an asset transfer occurred. The privacy comes from the fact that contract data exists only between the parties involved.

Multi-Protocol Commitments (MPC)

Multiple RGB contracts can batch their state transitions into a single Merkle tree, committed in one Bitcoin transaction. This means one transaction can carry state changes for hundreds of different contracts — massive scalability compared to one-contract-per-transaction models.

Why RGB Over Alternatives

	Omni Layer	Liquid	RGB
State storage	On-chain (OP_RETURN)	Sidechain	Client-side
Consensus	Bitcoin miners	Federation (11/15 multisig)	Individual validation
Privacy	Transparent	Confidential transactions	Private by default
Scalability	Limited by block space	Sidechain throughput	Unbounded (off-chain)
Trust model	Bitcoin security	Federation trust	Bitcoin security + individual validation
Smart contracts	Limited	Simplicity (limited)	AluVM (Turing-equivalent)

RGB's smart contracts run on AluVM — a register-based virtual machine that's Turing-equivalent but designed for formal verification. Simple contracts use Rust macros; complex ones use AluAssembly (with a higher-level language called Contractum in development).

Security Architecture

Building a wallet means handling private keys. The attack surface includes the device, the network, the supply chain, and the user.

Key Storage

Platform	Storage	Encryption	Biometric Unlock
iOS	Keychain (Secure Enclave)	Hardware-backed AES-256	Face ID / Touch ID
Android	Keystore (TEE/StrongBox)	Hardware-backed	Fingerprint / Face
Chrome Extension	`chrome.storage.local`	AES-256-GCM (app-level)	N/A

Never store keys in plaintext, localStorage, or SharedPreferences. For the Chrome Extension, we implemented app-level encryption since Chrome's storage API doesn't provide hardware-backed encryption.

Signing Isolation

Transaction signing happens in an isolated context:

Load private key from secure storage
Deserialize the PSBT (Partially Signed Bitcoin Transaction)
Sign all inputs
Wipe the private key from memory immediately
Return the signed transaction

We used PSBT (BIP-174) throughout — it separates transaction construction from signing, enabling hardware wallet support and multi-sig flows.

Supply Chain Security

Every dependency is a potential attack vector. Our approach:

Pinned all dependency versions — no ^ or ~ in package.json for crypto packages
Audited critical packages: bip39, bitcoinjs-lib, tiny-secp256k1, @noble/secp256k1
Reproducible builds — same source always produces the same binary
No post-install scripts for crypto dependencies
Subresource Integrity (SRI) for the Chrome Extension

The 2024 xz backdoor and multiple npm supply chain attacks validated this paranoia.

Transaction Malleability

Before SegWit, transaction signatures weren't covered by the txid hash — an attacker could modify the signature script without invalidating it, changing the txid. This breaks any system that tracks unconfirmed transactions by txid.

Mitigation: We use SegWit (P2WPKH) exclusively, which moves signature data to the witness and computes txid without it. For any remaining edge cases, we track transactions by the UTXOs they spend (inputs), not by txid.

Address Verification

A clipboard hijacker changing one character in a Bitcoin address can drain funds. Our approach:

Display the full address (never truncate in the send flow)
Show a QR code for cross-device verification
For large amounts: require the user to send a small test transaction first

Testing Strategy

Bitcoin transactions are irreversible. Our testing pipeline:

Unit tests: Key derivation against BIP-32/39/44 test vectors (from the specs)
Integration tests on testnet: 500+ automated transaction tests — send, receive, UTXO consolidation, fee estimation accuracy, edge cases (dust, max UTXO count)
Regtest environment: Private Bitcoin network for deterministic testing — we control block production
Fuzzing: Random inputs to transaction construction, PSBT parsing, and address validation
Security audit: Two independent audits before mainnet launch, third audit 6 months after

Key Takeaways

UTXOs are the hard part. Key derivation is well-documented with test vectors. UTXO management — coin selection, consolidation, fee estimation, dust handling, change output strategy — is where wallet quality differs.
Hardened derivation isn't optional. The xpub + leaked child private key = full compromise attack is real. Use hardened derivation for account separation, always.
Fee estimation is a product decision. The algorithm is simple; the UX is hard. Users don't understand sat/vByte — they understand "fast", "normal", "economy."
RGB is production-ready for issuance, early for trading. The protocol is sound and the privacy properties are exceptional. But the tooling ecosystem (wallets, explorers, DEXes) is still maturing compared to EVM.
Security is never "done." We did two independent security audits and still found issues in the third review. Budget for continuous security work, especially around dependency updates.
Test on testnet obsessively. Mainnet bugs cost real money. We required 100% testnet pass rate for 48 hours before any mainnet deployment.

Tokenizing Real Estate on Algorand: A MiCA-Compliant Architecture — ASA tokenization, regulatory compliance, and escrow design for real-world assets
MCP (Model Context Protocol): Connecting AI Agents to Real Tools — the protocol standard for connecting agents to databases, CRMs, and blockchain APIs

Need a Bitcoin wallet, RGB integration, or custom blockchain infrastructure? I've built it end-to-end. Reach out or book a call.