FoundationalProblem 6 of 20

FoundationalStarter

URL Shortener

Strategy PatternFactory Pattern

Base62 encode a counter and store the mapping. Reads dominate writes 100:1, so optimize your lookup path. The strategy pattern lets you swap encoding schemes without rewiring anything.

Key Abstractions

URLShortener

Orchestrator that coordinates code generation, storage, and analytics

CodeGenerator

Strategy interface for generating short codes: base62, hash-based, or random

URLStore

Persistence interface for storing and retrieving URL mappings

Analytics

Tracks click counts, timestamps, and referrer data per short URL

Class Diagram

The Key Insight

A URL shortener is really two separate problems glued together. Problem one: generate a unique short code for a given URL. Problem two: look it up fast when someone clicks it. That second problem dominates your design because reads outnumber writes by a factor of 100x or more.

For code generation, base62 encoding a monotonic counter is the simplest approach that actually works. You get short, readable codes with zero collision risk. No need for retry loops or uniqueness checks. Hash-based approaches seem clever until you realize truncated hashes collide way more often than people expect, and now you need collision resolution logic cluttering up your write path.

The Strategy pattern on the code generator is not just academic tidiness. It is genuinely useful here. You might start with base62 counters, then switch to random codes for vanity URLs, or hash-based codes for idempotent shortening. The shortener itself stays identical.

Requirements

Functional

shorten(longUrl) accepts a valid URL and returns a shortened URL
resolve(shortCode) redirects a short code to the original URL
Same long URL submitted twice returns the same short URL (deduplication)
Basic click analytics: total clicks per short URL

Non-Functional

Resolve must be fast since reads vastly outnumber writes
Short codes should be compact, ideally 7 characters or fewer
Thread-safe for concurrent access
Code generation strategy should be swappable without modifying core logic

Design Decisions

Why base62 encoding over MD5 hashing?

Base62 on a counter gives you guaranteed uniqueness. Every increment produces a different code. MD5 hashing requires you to truncate the output to keep it short, which raises collision probability significantly. With truncated hashes, you need a check-and-retry loop on every write. Base62 avoids all of that complexity.

Why maintain a reverse mapping (long to short)?

Without it, submitting the same URL twice creates two different short codes pointing to the same destination. That wastes storage and confuses users who expect a stable mapping. The reverse lookup adds some memory overhead, but it keeps the system predictable and idempotent.

Why separate URLStore behind an interface?

Today it is an in-memory HashMap. Tomorrow it could be Redis, DynamoDB, or Postgres. The shortener logic does not care. This separation also makes testing trivial since you can inject a mock store that simulates failures or latency.

Why track analytics separately from the store?

Analytics and URL resolution have different access patterns and scaling needs. Bundling them together means your redirect latency pays for analytics overhead. Keeping them separate lets you make the redirect path as lean as possible and process analytics asynchronously if needed.

Interview Follow-ups

"How would you handle custom aliases?" Add a customShorten(alias, longUrl) method that checks if the alias is taken, then stores it directly. Skip the code generator entirely for custom requests.
"What happens when the counter overflows?" With a 7-character base62 code, you have 3.5 trillion unique values. If you somehow exhaust that, increase the code length by one character and you get 218 trillion more.
"How do you prevent abuse?" Add a Rate Limiter per IP or API key. Validate URLs against a blocklist. Set expiration policies so abandoned short URLs get cleaned up.
"How would you scale this for billions of URLs?" Partition the counter range across multiple servers. Each server owns a non-overlapping range, so they generate codes independently without coordination.

Code Implementation

  1  from __future__ import annotations
  2  from abc import ABC, abstractmethod
  3  from threading import Lock
  4  from datetime import datetime
  5  import hashlib
  6  import string
  7  
  8  
  9  class CodeGenerator(ABC):
 10      """Strategy interface for short code generation."""
 11  
 12      @abstractmethod
 13      def generate(self, url: str) -> str: ...
 14  
 15  
 16  class Base62Generator(CodeGenerator):
 17      """Counter-based base62 encoding. No collisions possible."""
 18  
 19      CHARS = string.ascii_letters + string.digits  # a-zA-Z0-9
 20  
 21      def __init__(self, start: int = 100000):
 22          self._counter = start
 23          self._lock = Lock()
 24  
 25      def generate(self, url: str) -> str:
 26          with self._lock:
 27              self._counter += 1
 28              return self._encode(self._counter)
 29  
 30      def _encode(self, num: int) -> str:
 31          if num == 0:
 32              return self.CHARS[0]
 33          result = []
 34          while num > 0:
 35              result.append(self.CHARS[num % 62])
 36              num //= 62
 37          return "".join(reversed(result))
 38  
 39  
 40  class HashGenerator(CodeGenerator):
 41      """MD5-based generation. Needs collision handling."""
 42  
 43      def __init__(self, length: int = 7):
 44          self._length = length
 45  
 46      def generate(self, url: str) -> str:
 47          digest = hashlib.md5(url.encode()).hexdigest()
 48          num = int(digest[:12], 16)
 49          chars = string.ascii_letters + string.digits
 50          result = []
 51          for _ in range(self._length):
 52              result.append(chars[num % 62])
 53              num //= 62
 54          return "".join(result)
 55  
 56  
 57  class ClickStats:
 58      def __init__(self, short_code: str):
 59          self.short_code = short_code
 60          self.total_clicks = 0
 61          self.created_at = datetime.now()
 62  
 63      def __repr__(self) -> str:
 64          return f"ClickStats({self.short_code}, clicks={self.total_clicks})"
 65  
 66  
 67  class Analytics:
 68      """Tracks click data per short code."""
 69  
 70      def __init__(self):
 71          self._data: dict[str, ClickStats] = {}
 72  
 73      def init_tracking(self, short_code: str) -> None:
 74          self._data[short_code] = ClickStats(short_code)
 75  
 76      def record_click(self, short_code: str) -> None:
 77          if short_code in self._data:
 78              self._data[short_code].total_clicks += 1
 79  
 80      def get_stats(self, short_code: str) -> ClickStats | None:
 81          return self._data.get(short_code)
 82  
 83  
 84  class URLStore(ABC):
 85      """Persistence interface for URL mappings."""
 86  
 87      @abstractmethod
 88      def save(self, short_code: str, long_url: str) -> None: ...
 89  
 90      @abstractmethod
 91      def find(self, short_code: str) -> str | None: ...
 92  
 93      @abstractmethod
 94      def exists_by_long_url(self, long_url: str) -> str | None: ...
 95  
 96  
 97  class InMemoryURLStore(URLStore):
 98      def __init__(self):
 99          self._short_to_long: dict[str, str] = {}
100          self._long_to_short: dict[str, str] = {}
101  
102      def save(self, short_code: str, long_url: str) -> None:
103          self._short_to_long[short_code] = long_url
104          self._long_to_short[long_url] = short_code
105  
106      def find(self, short_code: str) -> str | None:
107          return self._short_to_long.get(short_code)
108  
109      def exists_by_long_url(self, long_url: str) -> str | None:
110          return self._long_to_short.get(long_url)
111  
112  
113  class URLShortener:
114      """Orchestrates URL shortening with pluggable code generation."""
115  
116      def __init__(
117          self,
118          base_url: str = "https://short.ly/",
119          generator: CodeGenerator | None = None,
120          store: URLStore | None = None,
121      ):
122          self._base_url = base_url
123          self._generator = generator or Base62Generator()
124          self._store = store or InMemoryURLStore()
125          self._analytics = Analytics()
126  
127      def shorten(self, long_url: str) -> str:
128          if not long_url or not long_url.startswith(("http://", "https://")):
129              raise ValueError("Invalid URL")
130  
131          existing = self._store.exists_by_long_url(long_url)
132          if existing:
133              return self._base_url + existing
134  
135          code = self._generator.generate(long_url)
136          self._store.save(code, long_url)
137          self._analytics.init_tracking(code)
138          return self._base_url + code
139  
140      def resolve(self, short_code: str) -> str | None:
141          long_url = self._store.find(short_code)
142          if long_url:
143              self._analytics.record_click(short_code)
144          return long_url
145  
146      def get_stats(self, short_code: str) -> ClickStats | None:
147          return self._analytics.get_stats(short_code)
148  
149  
150  if __name__ == "__main__":
151      shortener = URLShortener()
152  
153      # Shorten some URLs
154      s1 = shortener.shorten("https://example.com/very/long/path/to/resource")
155      s2 = shortener.shorten("https://another.com/page")
156      print(f"Shortened URL 1: {s1}")
157      print(f"Shortened URL 2: {s2}")
158  
159      # Same URL returns same short link
160      s1_again = shortener.shorten("https://example.com/very/long/path/to/resource")
161      print(f"Same URL again:  {s1_again}")
162      assert s1 == s1_again, "Same URL should return same short link"
163  
164      # Resolve short codes
165      code1 = s1.replace("https://short.ly/", "")
166      resolved = shortener.resolve(code1)
167      print(f"Resolved: {resolved}")
168      assert resolved == "https://example.com/very/long/path/to/resource"
169  
170      # Click tracking
171      shortener.resolve(code1)
172      shortener.resolve(code1)
173      stats = shortener.get_stats(code1)
174      print(f"Stats: {stats}")
175      assert stats.total_clicks == 3
176  
177      # Invalid URL handling
178      try:
179          shortener.shorten("not-a-url")
180      except ValueError as e:
181          print(f"Caught expected error: {e}")
182  
183      print("All assertions passed.")

Common Mistakes

✗Using a full MD5/SHA hash and truncating it. Truncation dramatically increases collision probability.
✗Not handling the same long URL submitted twice. Decide upfront: same short code or different ones.
✗Skipping input validation. Malformed URLs, empty strings, and excessively long inputs all need guardrails.
✗Ignoring the read path. If you only optimize writes, your redirect latency will suffer under real traffic.

Key Points

✓Base62 encoding gives you [a-zA-Z0-9] characters. A 7-character code supports 62^7 = 3.5 trillion unique URLs.
✓Counter-based generation guarantees uniqueness without collision checks. Hash-based needs collision handling.
✓Read-to-write ratio is roughly 100:1. Cache the short-to-long mapping aggressively.
✓Strategy pattern on CodeGenerator means you can swap base62 for MD5-based hashing without touching the shortener.

Related Designs