URL Shortener
Base62 encode a counter and store the mapping. Reads dominate writes 100:1, so optimize your lookup path. The strategy pattern lets you swap encoding schemes without rewiring anything.
Key Abstractions
Orchestrator that coordinates code generation, storage, and analytics
Strategy interface for generating short codes: base62, hash-based, or random
Persistence interface for storing and retrieving URL mappings
Tracks click counts, timestamps, and referrer data per short URL
Class Diagram
The Key Insight
A URL shortener is really two separate problems glued together. Problem one: generate a unique short code for a given URL. Problem two: look it up fast when someone clicks it. That second problem dominates your design because reads outnumber writes by a factor of 100x or more.
For code generation, base62 encoding a monotonic counter is the simplest approach that actually works. You get short, readable codes with zero collision risk. No need for retry loops or uniqueness checks. Hash-based approaches seem clever until you realize truncated hashes collide way more often than people expect, and now you need collision resolution logic cluttering up your write path.
The Strategy pattern on the code generator is not just academic tidiness. It is genuinely useful here. You might start with base62 counters, then switch to random codes for vanity URLs, or hash-based codes for idempotent shortening. The shortener itself stays identical.
Requirements
Functional
shorten(longUrl)accepts a valid URL and returns a shortened URLresolve(shortCode)redirects a short code to the original URL- Same long URL submitted twice returns the same short URL (deduplication)
- Basic click analytics: total clicks per short URL
Non-Functional
- Resolve must be fast since reads vastly outnumber writes
- Short codes should be compact, ideally 7 characters or fewer
- Thread-safe for concurrent access
- Code generation strategy should be swappable without modifying core logic
Design Decisions
Why base62 encoding over MD5 hashing?
Base62 on a counter gives you guaranteed uniqueness. Every increment produces a different code. MD5 hashing requires you to truncate the output to keep it short, which raises collision probability significantly. With truncated hashes, you need a check-and-retry loop on every write. Base62 avoids all of that complexity.
Why maintain a reverse mapping (long to short)?
Without it, submitting the same URL twice creates two different short codes pointing to the same destination. That wastes storage and confuses users who expect a stable mapping. The reverse lookup adds some memory overhead, but it keeps the system predictable and idempotent.
Why separate URLStore behind an interface?
Today it is an in-memory HashMap. Tomorrow it could be Redis, DynamoDB, or Postgres. The shortener logic does not care. This separation also makes testing trivial since you can inject a mock store that simulates failures or latency.
Why track analytics separately from the store?
Analytics and URL resolution have different access patterns and scaling needs. Bundling them together means your redirect latency pays for analytics overhead. Keeping them separate lets you make the redirect path as lean as possible and process analytics asynchronously if needed.
Interview Follow-ups
- "How would you handle custom aliases?" Add a
customShorten(alias, longUrl)method that checks if the alias is taken, then stores it directly. Skip the code generator entirely for custom requests. - "What happens when the counter overflows?" With a 7-character base62 code, you have 3.5 trillion unique values. If you somehow exhaust that, increase the code length by one character and you get 218 trillion more.
- "How do you prevent abuse?" Add a Rate Limiter per IP or API key. Validate URLs against a blocklist. Set expiration policies so abandoned short URLs get cleaned up.
- "How would you scale this for billions of URLs?" Partition the counter range across multiple servers. Each server owns a non-overlapping range, so they generate codes independently without coordination.
Code Implementation
1 from __future__ import annotations
2 from abc import ABC, abstractmethod
3 from threading import Lock
4 from datetime import datetime
5 import hashlib
6 import string
7
8
9 class CodeGenerator(ABC):
10 """Strategy interface for short code generation."""
11
12 @abstractmethod
13 def generate(self, url: str) -> str: ...
14
15
16 class Base62Generator(CodeGenerator):
17 """Counter-based base62 encoding. No collisions possible."""
18
19 CHARS = string.ascii_letters + string.digits # a-zA-Z0-9
20
21 def __init__(self, start: int = 100000):
22 self._counter = start
23 self._lock = Lock()
24
25 def generate(self, url: str) -> str:
26 with self._lock:
27 self._counter += 1
28 return self._encode(self._counter)
29
30 def _encode(self, num: int) -> str:
31 if num == 0:
32 return self.CHARS[0]
33 result = []
34 while num > 0:
35 result.append(self.CHARS[num % 62])
36 num //= 62
37 return "".join(reversed(result))
38
39
40 class HashGenerator(CodeGenerator):
41 """MD5-based generation. Needs collision handling."""
42
43 def __init__(self, length: int = 7):
44 self._length = length
45
46 def generate(self, url: str) -> str:
47 digest = hashlib.md5(url.encode()).hexdigest()
48 num = int(digest[:12], 16)
49 chars = string.ascii_letters + string.digits
50 result = []
51 for _ in range(self._length):
52 result.append(chars[num % 62])
53 num //= 62
54 return "".join(result)
55
56
57 class ClickStats:
58 def __init__(self, short_code: str):
59 self.short_code = short_code
60 self.total_clicks = 0
61 self.created_at = datetime.now()
62
63 def __repr__(self) -> str:
64 return f"ClickStats({self.short_code}, clicks={self.total_clicks})"
65
66
67 class Analytics:
68 """Tracks click data per short code."""
69
70 def __init__(self):
71 self._data: dict[str, ClickStats] = {}
72
73 def init_tracking(self, short_code: str) -> None:
74 self._data[short_code] = ClickStats(short_code)
75
76 def record_click(self, short_code: str) -> None:
77 if short_code in self._data:
78 self._data[short_code].total_clicks += 1
79
80 def get_stats(self, short_code: str) -> ClickStats | None:
81 return self._data.get(short_code)
82
83
84 class URLStore(ABC):
85 """Persistence interface for URL mappings."""
86
87 @abstractmethod
88 def save(self, short_code: str, long_url: str) -> None: ...
89
90 @abstractmethod
91 def find(self, short_code: str) -> str | None: ...
92
93 @abstractmethod
94 def exists_by_long_url(self, long_url: str) -> str | None: ...
95
96
97 class InMemoryURLStore(URLStore):
98 def __init__(self):
99 self._short_to_long: dict[str, str] = {}
100 self._long_to_short: dict[str, str] = {}
101
102 def save(self, short_code: str, long_url: str) -> None:
103 self._short_to_long[short_code] = long_url
104 self._long_to_short[long_url] = short_code
105
106 def find(self, short_code: str) -> str | None:
107 return self._short_to_long.get(short_code)
108
109 def exists_by_long_url(self, long_url: str) -> str | None:
110 return self._long_to_short.get(long_url)
111
112
113 class URLShortener:
114 """Orchestrates URL shortening with pluggable code generation."""
115
116 def __init__(
117 self,
118 base_url: str = "https://short.ly/",
119 generator: CodeGenerator | None = None,
120 store: URLStore | None = None,
121 ):
122 self._base_url = base_url
123 self._generator = generator or Base62Generator()
124 self._store = store or InMemoryURLStore()
125 self._analytics = Analytics()
126
127 def shorten(self, long_url: str) -> str:
128 if not long_url or not long_url.startswith(("http://", "https://")):
129 raise ValueError("Invalid URL")
130
131 existing = self._store.exists_by_long_url(long_url)
132 if existing:
133 return self._base_url + existing
134
135 code = self._generator.generate(long_url)
136 self._store.save(code, long_url)
137 self._analytics.init_tracking(code)
138 return self._base_url + code
139
140 def resolve(self, short_code: str) -> str | None:
141 long_url = self._store.find(short_code)
142 if long_url:
143 self._analytics.record_click(short_code)
144 return long_url
145
146 def get_stats(self, short_code: str) -> ClickStats | None:
147 return self._analytics.get_stats(short_code)
148
149
150 if __name__ == "__main__":
151 shortener = URLShortener()
152
153 # Shorten some URLs
154 s1 = shortener.shorten("https://example.com/very/long/path/to/resource")
155 s2 = shortener.shorten("https://another.com/page")
156 print(f"Shortened URL 1: {s1}")
157 print(f"Shortened URL 2: {s2}")
158
159 # Same URL returns same short link
160 s1_again = shortener.shorten("https://example.com/very/long/path/to/resource")
161 print(f"Same URL again: {s1_again}")
162 assert s1 == s1_again, "Same URL should return same short link"
163
164 # Resolve short codes
165 code1 = s1.replace("https://short.ly/", "")
166 resolved = shortener.resolve(code1)
167 print(f"Resolved: {resolved}")
168 assert resolved == "https://example.com/very/long/path/to/resource"
169
170 # Click tracking
171 shortener.resolve(code1)
172 shortener.resolve(code1)
173 stats = shortener.get_stats(code1)
174 print(f"Stats: {stats}")
175 assert stats.total_clicks == 3
176
177 # Invalid URL handling
178 try:
179 shortener.shorten("not-a-url")
180 except ValueError as e:
181 print(f"Caught expected error: {e}")
182
183 print("All assertions passed.")Common Mistakes
- ✗Using a full MD5/SHA hash and truncating it. Truncation dramatically increases collision probability.
- ✗Not handling the same long URL submitted twice. Decide upfront: same short code or different ones.
- ✗Skipping input validation. Malformed URLs, empty strings, and excessively long inputs all need guardrails.
- ✗Ignoring the read path. If you only optimize writes, your redirect latency will suffer under real traffic.
Key Points
- ✓Base62 encoding gives you [a-zA-Z0-9] characters. A 7-character code supports 62^7 = 3.5 trillion unique URLs.
- ✓Counter-based generation guarantees uniqueness without collision checks. Hash-based needs collision handling.
- ✓Read-to-write ratio is roughly 100:1. Cache the short-to-long mapping aggressively.
- ✓Strategy pattern on CodeGenerator means you can swap base62 for MD5-based hashing without touching the shortener.