Following is a tutorial that I constructed with the help of Claude on Vector Database. The tutorial makes you a witness to building of demonstration vector database step by step. It also shows you what is inside the magic of conversion of a plain text to numbers in a vector database: for example of a text like, 'The Eiffel Tower is in Paris'.
Building a Vector
Database from Scratch
and Peeping Inside It
A Beginner's Complete
Tutorial
1. What
is a Vector Database?
A
vector database is a specialised database designed to store, index, and query
high-dimensional vectors — arrays of numbers. These vectors are called
embeddings: mathematical representations of data (text, images, audio) where
similar items are close to each other in vector space.
Example:
The words king and queen would have vectors that are close together, while king
and pizza would be far apart.
Key
Operations
•
Store vectors (e.g. [0.12, 0.87, -0.34, ...] with hundreds of
dimensions)
•
Similarity search: find the N nearest vectors to a query vector
•
Filter by metadata alongside vector similarity
How is
it Different from a Regular Database?
|
|
SQL Database |
Vector Database |
|
Query
type |
Exact
match (WHERE age = 30) |
Similarity
match (find nearest) |
|
Question
asked |
Give me
all rows where X equals Y |
Give me
the most similar items to this |
|
Use case |
Structured
data, reports, transactions |
Semantic
search, AI, recommendations |
|
Understands
meaning? |
No |
Yes |
Popular
Vector Databases
•
Pinecone, Weaviate, Qdrant, Chroma, FAISS
•
We will build our own from scratch in Python!
2. Can
You Build One from Scratch?
Yes,
absolutely! It is a great learning project. Here is the spectrum of complexity:
Minimal
Viable Version (very doable solo)
A
basic vector database needs just three things:
•
Storage: save vectors and metadata to disk
•
Index: a data structure for fast nearest-neighbour search
•
Query engine: take a query vector, return top-K similar vectors
The
core algorithm is brute-force cosine similarity to start, then you can upgrade
to smarter indexing like HNSW (Hierarchical Navigable Small World graphs) which
is what production databases use.
Windows,
8 GB RAM — Is That Enough?
|
Concern |
Reality |
|
8 GB RAM |
Fine for
a personal or learning vector database |
|
Windows |
Python
works perfectly. Use Command Prompt or PowerShell |
|
Scale |
Can
handle 1 to 5 million vectors depending on dimensions |
A
1,536-dimension OpenAI embedding at float32 is about 6 KB per vector. So 8 GB
gives you room for roughly 1 million vectors in memory — which is substantial.
Four-Step
Build Plan
•
Step 1: Core engine — text to vectors, cosine similarity,
brute-force search
•
Step 2: Persistence — save to SQLite so data survives restarts
•
Step 3: Fast search — HNSW indexing for large databases
•
Step 4: REST API — FastAPI so any app can query the database
Note: Step 3 (HNSW via hnswlib) is optional for a toy or
learning database. Steps 1 and 2 alone are fully functional for thousands of
documents. hnswlib requires a C++ compiler to install on Windows, which can be
tricky. Steps 1 and 2 only need: pip install sentence-transformers numpy
3. What
Data Are We Turning into Vectors?
This
is the most important conceptual piece. You need two things:
•
An embedding model: converts text into vector numbers
•
Some text data: your knowledge base
We
use a completely free, runs-on-your-laptop embedding model called
sentence-transformers. No API key, no internet needed after the initial
download.
Our
Sample Knowledge Base
We
built a database of 20 plain text facts across 4 categories:
|
Category |
Topics |
Sample Fact |
|
Science |
biology,
astronomy, physics |
The
human body contains approximately 37 trillion cells... |
|
Geography |
landmarks,
rivers, mountains, cities |
The
Eiffel Tower is an iron lattice tower located in Paris... |
|
History |
wars,
inventions, space exploration |
Neil
Armstrong became the first human to walk on the Moon... |
|
Technology |
programming,
AI, networking, hardware |
Python
is a high-level language known for its simple syntax... |
Each
sentence is converted to a vector of 384 numbers. When you search Where is the
Eiffel Tower?, your question also becomes a vector, and the database finds the
closest matching document.
The
good thing: you can swap the knowledge base with anything. PDF notes, a product
catalogue, customer support FAQs, diary entries, a recipe book. The system
works the same regardless.
4.
Demonstration: Semantic Search in Action
Below
is actual output from running a python script. The AI model is loaded, 20
documents are converted to vectors, and 6 different search queries are run.
This
is our knowledge base put inside (hard coded) a python script. On this
knowledge base we will do our semantic search and find relevance in
percentages:
knowledge_base
# ── SCIENCE ───
"Photosynthesis is the process
by which plants convert sunlight,
water, and carbon dioxide into
glucose and oxygen.",
{"category":
"science", "topic": "biology"}
"The human body contains
approximately 37 trillion cells, "
"each performing specific
functions to keep us alive."
{"category":
"science", "topic": "biology"}
"DNA (deoxyribonucleic acid)
is the molecule that carries the
genetic instructions for the
development of all living organisms."
{"category":
"science", "topic": "biology"}
"Black holes are regions of space where
gravity is so strong
that nothing — not even light — can
escape once it passes the event horizon."
{"category":
"science", "topic": "astronomy"}
"The speed of light in a
vacuum is approximately 299,792 kilometres
per second, often denoted as 'c' in
physics equations."
{"category":
"science", "topic": "physics"}
# ── GEOGRAPHY ────────
"The Eiffel Tower is an iron
lattice tower located on the Champ de Mars
in Paris, France. It was
constructed in 1889 for the World's Fair.",
{"category":
"geography", "topic": "landmarks"}
"The Amazon River in South
America is the largest river in the world
by water discharge, carrying about
20% of all freshwater that flows into the oceans."
{"category":
"geography", "topic": "rivers"}
"Mount Everest, located in the
Himalayas,
is the tallest mountain on Earth at
8,848.86 meters above sea level."
{"category":
"geography", "topic": "mountains"}
"The Sahara Desert in North
Africa is the largest hot desert in the world,
covering approximately 9 million
square kilometers.",
{"category":
"geography", "topic": "deserts"}
"Tokyo is the capital city of
Japan and the most populous metropolitan area
in the world, with over 37 million
people in the greater metro region."
{"category":
"geography", "topic": "cities"}
# ── HISTORY
"The Great Wall of China was
built over many centuries, primarily during
the Ming Dynasty (1368–1644), to
protect against invasions from the north."
{"category":
"history", "topic": "ancient structures"}
"World War II lasted from 1939
to 1945 and involved most of the world's nations,
making it the deadliest and most
widespread war in human history."
{"category":
"history", "topic": "wars"}
"Neil Armstrong became the
first human to walk on the Moon on July 20, 1969,
during NASA's Apollo 11
mission."
{"category":
"history", "topic": "space exploration"}
"The printing press was
invented by Johannes Gutenberg around 1440 in Germany,
revolutionizing the spread of
knowledge across Europe."
{"category":
"history", "topic": "inventions"}
"The French Revolution, which
began in 1789, transformed France from a monarchy
to a republic and had a lasting
influence on modern democracy."
{"category":
"history", "topic": "politics"}
# ── TECHNOLOGY
"Python is a high-level,
general-purpose programming language known for its
"simple, readable syntax. It
was created by Guido van Rossum in 1991."
{"category":
"technology", "topic": "programming"}
"Artificial intelligence (AI)
refers to the simulation of human intelligence
in machines, enabling them to
learn, reason, and solve problems.",
{"category":
"technology", "topic": "AI"}
"The internet is a global
network of interconnected computers that communicate
using standardized protocols,
enabling information sharing worldwide."
{"category":
"technology", "topic": "networking"}
"A CPU (Central Processing
Unit) is the primary component of a computer
that executes instructions from programs by
performing arithmetic and logic operations.",
{"category":
"technology", "topic": "hardware"}
"Machine learning is a subset
of AI where systems learn from data to improve
their performance on tasks without
being explicitly programmed for each one."
{"category":
"technology", "topic": "AI"}
Loading
and Storing 20 Documents
Loading the AI embedding model... (this takes ~5 seconds the
first time)
Model loaded successfully!
Loading 20 documents into the database...
(This takes ~10-20 seconds while the AI model processes each
text)
Converting to vector:
'Photosynthesis is the process...'
Document #0 added. Total
documents: 1
Converting to vector:
'The human body contains approximately...'
Document #1 added. Total
documents: 2
... (18 more documents)
...
Document #19 added. Total documents: 20
Search
Results and What They Mean
Query 1: Where is the
Eiffel Tower located?
#1 Score: 79.3%
[geography]
The Eiffel Tower is
an iron lattice tower located on the Champ
de Mars in Paris,
France.
#2 Score: 23.8%
[history]
The Great Wall of
China was built over many centuries...
#3 Score: 23.5%
[geography]
Mount Everest, located in the
Himalayas...
Result
#1 scores 79.3% — a strong, confident match. Results #2 and #3 score around 23%
— the database is saying these are the next closest things but they are really
not related. The big gap between 79% and 23% shows the database is confident in
its top answer.
Query 2: How many cells
are in the human body?
#1 Score: 81.4%
[science]
The human body
contains approximately 37 trillion cells...
#2 Score: 24.9%
[science]
DNA
(deoxyribonucleic acid) is the molecule that carries...
#3 Score: 19.6%
[geography]
Tokyo is the
capital city of Japan and the most populous
metropolitan area in the world, with
over 37 million people...
Notice
the Tokyo result. The query mentions 37 and so does the Tokyo fact (37 million
people). The AI noticed a weak numerical similarity. At only 19.6% this is a
weak match and can be safely ignored — a practical rule of thumb: trust results
above 75%, consider results between 40% and 75%, ignore results below 40%.
Filtered Search: How do
plants make food? (Science only)
Applying filter
{'category': 'science'} -> 5 candidates
#1 Score: 49.4%
[science]
Photosynthesis is the process by which
plants convert sunlight...
This
is the SQL-like WHERE clause equivalent. Only science documents were searched,
narrowing from 20 to 5 candidates before the similarity comparison ran.
5. Can
You Write SQL Queries Like SELECT...WHERE...?
Partially
yes, but with an important twist. Vector databases are not replacements for SQL
databases. They serve a different purpose.
What You
CAN Do: Metadata Filtering
Most
vector databases including ours support metadata filtering alongside vector
search. This is the closest equivalent to a SQL WHERE clause:
# This is equivalent to:
# SELECT * FROM docs WHERE category = 'science'
# ORDER BY similarity DESC LIMIT 3
results = db.search(
query_text = 'How do plants make food?',
top_k = 3,
filter_metadata =
{'category': 'science'} # the WHERE
clause
)
What You
CANNOT Do
# SQL has no concept of semantic meaning:
SELECT * FROM docs WHERE meaning = 'something about taxes'
# This is impossible in SQL.
# This is EXACTLY what
vector search does instead.
You
convert your question into an embedding vector, then search for nearest
neighbours. The metadata filter (category, date, source) is done in Python as a
pre-filter, not in SQL.
6.
Peeping Inside: What Does a Vector Actually Look Like?
We
wrote a script called peek_at_vectors.py that shows the actual numbers a
sentence becomes. Here is real output from that script:
Part 1:
The 384 Numbers of One Sentence
Sentence : "The Eiffel Tower is in Paris."
Vector : 384 numbers
(showing first 20)
[ 0]
+0.071102 (positive)
[ 1]
+0.037129
[ 3]
-0.010988 (negative)
[ 6]
-0.074099
[ 13] -0.101622
(largest negative)
...
Smallest: -0.187900 Largest: +0.134785 Average: -0.000225
Why
Negative Numbers?
The
384 numbers are not attributes like how French is this sentence? They are
abstract mathematical coordinates that the AI learned on its own. Negative
means towards the other end of that dimension, just like a city west of a map
centre gets a negative longitude. It does not mean wrong or missing.
Think
of it like mood on a scale from -10 to +10. Hunger at -9 means not hungry at
all, not that hunger is broken. In vectors it is the same idea across 384
dimensions simultaneously.
Part 2:
Similar Sentences Produce Similar Vectors
Anchor sentence: "The Eiffel Tower is in Paris."
94.3% Paris is home to the famous Eiffel Tower.
36.1% France is a country in Western Europe.
-0.9% Machine learning is a subset of artificial
intelligence.
-0.3%
The Amazon river flows through South America.
The
same meaning expressed in different words scores 94.3%. A related topic
(France) scores 36.1%. Completely unrelated topics score near 0% or slightly
negative. This is cosine similarity working correctly.
Part 3:
Two Similar Sentences Side by Side
Sentence A: "The
Eiffel Tower is in Paris."
Sentence B: "Paris
is home to the famous Eiffel Tower."
Index Vector A
Vector B Difference
[ 0]
+0.071102 +0.084144 0.013043
[ 1] +0.037129 +0.048660
0.011532
[ 6]
-0.074099 -0.055071 0.019028
[ 9]
-0.024027 -0.025116 0.001089
Numbers
are not identical, but follow a similar pattern. This is why cosine similarity
correctly identifies them as close.
The
numbers are not identical because the word order is different and different
words are present. But the pattern of which dimensions are positive and which
are negative is similar. Cosine similarity measures this pattern alignment, not
the exact values.
7. Step
2: Persistence with SQLite
In
Step 1, everything is stored in RAM. When the program stops, RAM is wiped clean
and all your data is lost. Persistence means saving data to a file on your hard
drive so it survives restarts.
We
use SQLite, a lightweight database that lives in a single .db file on your
computer. No server needed, no installation, no passwords. Python includes
SQLite built-in.
Hybrid
Storage Model
Vectors
are numpy arrays (lists of 384 floats). SQL does not natively understand numpy
arrays, so we serialise them:
•
Saving: numpy array to bytes (binary blob) stored in SQLite
•
Loading: bytes from SQLite back to numpy array used in Python
|
Column |
Type |
Contains |
|
id |
INTEGER |
Unique
number for each document |
|
text |
TEXT |
The
original sentence or paragraph |
|
vector |
BLOB |
The 384
numbers stored as binary bytes |
|
metadata |
TEXT |
JSON
string like {"category": "science"} |
|
timestamp |
REAL |
When the
document was added |
Metadata
Filtering — the WHERE Clause
The
metadata column stores any extra information you choose. When searching you can
filter by any metadata field:
db.search(
query_text = 'How do plants make food?',
filter_metadata =
{'category': 'science', 'topic': 'biology'}
)
# Under the hood this runs:
# for doc in self.documents:
# if
doc['metadata']['category'] == 'science':
# calculate_similarity(query,
doc['vector'])
8.
Step 3: Fast Search with HNSW
Brute
force search compares your query vector against every single stored vector one
by one. With 1 million vectors that is slow. HNSW (Hierarchical Navigable Small
World) is a smarter graph-based index.
The
Library Analogy
Imagine
you are in a huge library looking for books similar to one you like:
•
Brute force: you read every single book description. Exhaustive
but slow.
•
HNSW: the library has a smart card catalogue system with 3 levels:
◦
Level 2 (top): a few landmark books — big jumps across the library
◦
Level 1 (middle): more books — medium-range navigation
◦
Level 0 (bottom): all books — fine-grained comparison with
neighbours
You
start at the top, zoom in roughly, then refine at lower levels. Instead of
checking 1,000,000 books you might only check 100 to 200. Accuracy is typically
95 to 99%.
When Do
You Need HNSW?
|
Documents |
RAM Used |
Recommendation |
|
Under
5,000 |
~30 MB |
Brute
force is fast enough. No HNSW needed. |
|
5,000 to
100,000 |
~600 MB |
HNSW
starts helping. Optional. |
|
Over
100,000 |
~1+ GB |
HNSW is
strongly recommended. |
|
1,000,000+ |
~6 GB |
HNSW is
essential. |
For a toy or learning database, hnswlib is not required. Steps 1
and 2 work perfectly for thousands of documents. hnswlib requires a C++
compiler to build on Windows which can be difficult to install.
9. Step
4: REST API with FastAPI
Right
now the vector database is just a Python script you run manually. A REST API
turns it into a service that any app, language, or tool can talk to over HTTP.
What is
a REST API?
Think
of it like a waiter at a restaurant. You (the client app) give your order to
the waiter (the API). The waiter takes it to the kitchen (the vector database).
The waiter brings back your food (the results).
HTTP
uses verbs to describe what you want to do:
•
GET: retrieve information (like asking what is on the menu?)
•
POST: send data to create or search (like placing an order)
•
DELETE: remove something (like cancelling an order)
10.
Peeping Inside: The Model Weights
The
model that converts text to vectors (all-MiniLM-L6-v2) has 22.7 million learned
numbers called weights. Here is actual output from our explore_and_idioms.py
script showing these weights:
Layer : embeddings.word_embeddings.weight
Shape : (30522, 384)
(11,720,448 weights)
Sample values: [-0.0200, -0.0034, -0.0147, 0.0117, -0.0032, ...]
Layer : encoder.layer.0.attention.self.query.weight
Shape : (384, 384)
(147,456 weights)
Sample values: [-0.1486, 0.0436, 0.0856, 0.0242, -0.0399, ...]
Layer : encoder.layer.0.intermediate.dense.weight
Shape : (1536, 384)
(589,824 weights)
Sample values: [-0.0287, -0.0777, 0.0871, -0.0553, -0.1051, ...]
TOTAL weights in this
model: 22,713,216
(That is 22.7 million numbers the AI
learned!)
Where
Are They Stored on Your Disk?
C:\Users\xxxx\.cache\huggingface\hub\
models--sentence-transformers--all-MiniLM-L6-v2\
snapshots\<hash>\
pytorch_model.bin <- the binary weight file
config.json <- model configuration
tokenizer.json <- vocabulary
Converting
Weights to Human-Readable Form
The
.bin file is pure binary and cannot be opened directly in Notepad or Excel. A
python script can translate it thus:
•
weights_summary.json: summary of all layers with statistics.
The contents of the json file will look something like this:
[
{
"layer_name":
"embeddings.word_embeddings.weight",
"shape": [
30522,
384
],
"total_weights": 11720448,
"minimum": -0.50732421875,
"maximum": 1.049399971961975,
"mean": 1.1778635780501645e-05,
"std_deviation":
0.055623315274715424,
"first_10_values": [
-0.019989013671875,
-0.0034027099609375,
-0.014678955078125,
0.0117034912109375,
-0.0032482147216796875,
0.012664794921875,
0.015899658203125,
0.007541656494140625,
-0.0021839141845703125,
-0.0034465789794921875
]
},
{
"layer_name":
"embeddings.position_embeddings.weight",
"shape": [
512,
384
],
"total_weights": 196608,
"minimum": -0.20261888206005096,
"maximum": 1.8740946054458618,
"mean": 0.00021643542277161032,
"std_deviation":
0.024651674553751945,
"first_10_values": [
-0.08555416762828827,
-0.03291567042469978,
-0.0170280858874321,
0.10415853559970856,
0.019368130713701248,
-0.000353822426404804,
0.028837377205491066,
-0.009048297069966793,
-0.0034039390739053488,
0.017565004527568817
]
},
{
"layer_name":
"embeddings.token_type_embeddings.weight",
"shape": [
2,
384
],
"total_weights": 768,
"minimum": -0.0997314453125,
"maximum": 0.3629935383796692,
"mean": 0.0006497848662547767,
"std_deviation":
0.024186691269278526,
"first_10_values": [
0.014616478234529495,
0.0037614137399941683,
-0.012041018344461918,
-0.017752939835190773,
0.004878282081335783,
0.013683986850082874,
-0.015841402113437653,
0.0014262089971452951,
-0.014348766766488552,
0.033464301377534866
]
},
And so on…
•
one_layer_full.csv: every single weight value of one small layer
(384 rows). Can be piped and opened in Excel.
The contents would be like this:
|
index |
weight_value |
|
0 |
0.169540539 |
|
1 |
0.111124329 |
|
2 |
0.006444646 |
|
3 |
0.021633515 |
|
4 |
0.042796902 |
|
5 |
-0.03980178 |
|
6 |
-0.029967835 |
|
7 |
0.039178368 |
|
8 |
0.016793983 |
And
so on, till 383, that is, including index 0.
Editing
weights is possible in Python, but changing them randomly breaks the model.
Deliberately changing thousands of weights in a mathematically guided way is
called fine-tuning, a whole field of AI research.
11. What
Are the Six Transformer Layers?
The
model which we are using for this tutorial, called all-MiniLM-L6-v2 contains L6
meaning 6 transformer layers. Each layer is a stack of mathematical operations
that progressively refines the meaning of the sentence.
The
Multi-Floor Building Analogy
Think
of the model as a six-floor building that a sentence travels through from
bottom to top:
|
Layer |
Name |
What it does |
Analogy |
|
0 |
Embedding |
Converts
each word to a starting vector by looking it up in a table of 30,522 words |
The
lobby: every visitor gets a name badge |
|
1 |
Transformer
1 |
Notices
word order and basic grammar patterns |
Floor 1:
sort visitors by department |
|
2 |
Transformer
2 |
Nearby
words start influencing each other's meaning |
Floor 2:
departments talk to neighbours |
|
3 |
Transformer
3 |
Subject,
verb, object relationships become clear |
Floor 3:
who did what to whom |
|
4 |
Transformer
4 |
Meaning
of each word shaped by the full sentence |
Floor 4:
everyone reads the full memo |
|
5 |
Transformer
5 |
High-level
meaning structures emerge |
Floor 5:
teams form around concepts |
|
6 |
Transformer
6 |
Richest,
most context-aware final representation |
Floor 6:
final decision, exit as one idea |
The
Three Sub-Layers Inside Each Transformer Layer
Inside
every transformer layer, three operations happen in sequence:
Attention (Query, Key,
Value matrices)
Every
word looks at every other word and asks: how much should I pay attention to
you?
For "The Eiffel Tower is in Paris":
Tower -> pays HIGH attention to Eiffel (they belong together)
Tower -> pays LOW attention to is (grammatical filler)
Paris
-> pays HIGH attention to in
(location relationship)
Feed Forward (expansion
then compression)
After
attention, each word vector expands from 384 to 1,536 dimensions to think more,
then compresses back to 384. This is the processing step.
Layer Normalisation
Rescales
values so the mean stays near 0 and standard deviation near 1. Without this,
values would explode (become huge) or vanish (become near zero) after passing
through many layers.
Why 6
Layers?
|
Model |
Layers |
Weights |
File Size |
|
our
model (all-MiniLM-L6-v2) |
6 |
22
million |
80 MB |
|
BERT-base |
12 |
110
million |
440 MB |
|
GPT-3 |
96 |
175
billion |
700 GB |
More
layers mean more capacity to understand nuance, longer context, subtler
meaning, but also more memory, computation, and electricity. Our 6-layer model
is deliberately small and fast, good enough for sentence similarity and light
enough to run on a laptop.
12.
Semantic Search on English and Hindi Idioms
We
tested semantic search on two types of idiom databases. The results revealed
important lessons about what the model does and does not understand.
English
Idioms: Search by Word
Query: "stitch"
#1 63.5%
A stitch in time saves nine
#2 26.3%
Bite the bullet
#3 22.1%
Every cloud has a silver lining
Query: "rolling stone"
#1 63.8%
A rolling stone gathers no moss
#2
23.7% Don't judge a book by its
cover
Searching
by a word that appears in the idiom works well because the model has seen those
words in training and associates them with the idiom.
English
Idioms: Search by Meaning
Query: "don't worry about things that already
happened"
#1 29.1%
Don't cry over spilled milk
Meaning:
Don't waste time worrying about past mistakes
Query: "she looks rough but is actually a kind person"
#1
16.2% Don't judge a book by its
cover
The
scores are lower (16% to 30%) when searching by meaning rather than by words.
The model finds the right idiom but less confidently. This is because idioms
are deliberately indirect, their literal words do not match their meaning.
Hindi
Idioms: English Model vs Multilingual Model
[English model on Hindi idioms]
Query: "actions have consequences"
#1 14.7%
Aam ke aam, gutliyon ke daam
<- WRONG
Meaning:
Getting double benefit from one thing
[Multilingual model on Hindi idioms]
Query: "actions have consequences"
#1 33.0%
Sau sunaar ke, ek lohaar ka <-
BETTER
Meaning: One strong decisive
action is worth a hundred weak ones
The
English model sees Hindi words as unfamiliar character sequences and groups
them by character patterns rather than meaning. The multilingual model
(paraphrase-multilingual-MiniLM-L12-v2, 470 MB download) was trained on 50+
languages including Hindi and produces meaningfully better results. The model
choice matters as much as the database design.
13.
Security and Privacy for a Banking Deployment
For
a real banking application, the model weight files and customer data need
protection. Here is the complete picture:
Encrypting
the Model Weight Binary Files
The
.bin file can be encrypted. However, raw RSA cannot encrypt large files
directly:
•
RSA is designed for small data: maximum 245 bytes per operation at
2048-bit key strength
•
A model weight file is 80 MB. RSA alone would take hours to
encrypt it
The
correct approach is Hybrid Encryption:
Step 1: Generate a random AES-256 key (just 32 bytes)
Step 2: Encrypt the MODEL FILE with AES-256 <- fast, handles large files
Step 3: Encrypt the AES KEY with RSA <- small, RSA handles this
perfectly
Step 4: Store encrypted file + encrypted key together
At runtime:
Step 1: Decrypt the AES key using RSA private key <- milliseconds
Step 2: Decrypt the model file using AES key <- seconds
Step 3: Load weights into
RAM, wipe the decrypted key
The RAM
Exposure Problem
Encryption
on disk does not protect data once it is loaded into RAM for inference. The
decrypted weights sit in memory and are readable by anyone with OS-level
access. Banks address this with:
•
Hardware Security Modules (HSM): a physical tamper-proof chip. The
decryption key never leaves the chip.
•
Trusted Execution Environments (TEE): Intel SGX or AMD SEV create
encrypted RAM enclaves that even the OS cannot read
•
Confidential Computing: cloud providers offer VMs where even the
hypervisor cannot read memory
Customer
Data Protection
|
Concern |
Solution |
|
Database
file on disk |
SQLCipher:
a drop-in SQLite replacement with AES-256 encryption |
|
Data in
transit |
HTTPS/TLS
on the FastAPI server. Never plain HTTP. |
|
Access
control |
OAuth2
tokens or mTLS certificates on the /search endpoint |
|
Audit
trail |
Log
every query, result, timestamp, and requester identity |
|
Data
minimisation |
Store
only vectors if raw text is not needed |
This is
visualisation of a text string’s journey through all the 6 layers:
This is the
travel of the first 20 values:
End of Tutorial
No comments:
Post a Comment