25 April 2026

Tutorial on (and Demystifying) Vector Database

 Following is a tutorial that I constructed with the help of Claude on Vector Database. The tutorial makes you a witness to building of demonstration vector database step by step. It also shows you what is inside the magic of conversion of a plain text to numbers in a vector database: for example of a text like, 'The Eiffel Tower is in Paris'.

Building a Vector Database from Scratch

and Peeping Inside It

 

A Beginner's Complete Tutorial

1. What is a Vector Database?

A vector database is a specialised database designed to store, index, and query high-dimensional vectors — arrays of numbers. These vectors are called embeddings: mathematical representations of data (text, images, audio) where similar items are close to each other in vector space.

Example: The words king and queen would have vectors that are close together, while king and pizza would be far apart.

Key Operations

         Store vectors (e.g. [0.12, 0.87, -0.34, ...] with hundreds of dimensions)

         Similarity search: find the N nearest vectors to a query vector

         Filter by metadata alongside vector similarity

How is it Different from a Regular Database?

 

 

SQL Database

Vector Database

Query type

Exact match (WHERE age = 30)

Similarity match (find nearest)

Question asked

Give me all rows where X equals Y

Give me the most similar items to this

Use case

Structured data, reports, transactions

Semantic search, AI, recommendations

Understands meaning?

No

Yes

Popular Vector Databases

         Pinecone, Weaviate, Qdrant, Chroma, FAISS

         We will build our own from scratch in Python!

2. Can You Build One from Scratch?

Yes, absolutely! It is a great learning project. Here is the spectrum of complexity:

Minimal Viable Version (very doable solo)

A basic vector database needs just three things:

         Storage: save vectors and metadata to disk

         Index: a data structure for fast nearest-neighbour search

         Query engine: take a query vector, return top-K similar vectors

 

The core algorithm is brute-force cosine similarity to start, then you can upgrade to smarter indexing like HNSW (Hierarchical Navigable Small World graphs) which is what production databases use.

Windows, 8 GB RAM — Is That Enough?

 

Concern

Reality

8 GB RAM

Fine for a personal or learning vector database

Windows

Python works perfectly. Use Command Prompt or PowerShell

Scale

Can handle 1 to 5 million vectors depending on dimensions

 

A 1,536-dimension OpenAI embedding at float32 is about 6 KB per vector. So 8 GB gives you room for roughly 1 million vectors in memory — which is substantial.

Four-Step Build Plan

         Step 1: Core engine — text to vectors, cosine similarity, brute-force search

         Step 2: Persistence — save to SQLite so data survives restarts

         Step 3: Fast search — HNSW indexing for large databases

         Step 4: REST API — FastAPI so any app can query the database

 

Note: Step 3 (HNSW via hnswlib) is optional for a toy or learning database. Steps 1 and 2 alone are fully functional for thousands of documents. hnswlib requires a C++ compiler to install on Windows, which can be tricky. Steps 1 and 2 only need: pip install sentence-transformers numpy

3. What Data Are We Turning into Vectors?

This is the most important conceptual piece. You need two things:

         An embedding model: converts text into vector numbers

         Some text data: your knowledge base

We use a completely free, runs-on-your-laptop embedding model called sentence-transformers. No API key, no internet needed after the initial download.

 

Our Sample Knowledge Base

We built a database of 20 plain text facts across 4 categories:

 

Category

Topics

Sample Fact

Science

biology, astronomy, physics

The human body contains approximately 37 trillion cells...

Geography

landmarks, rivers, mountains, cities

The Eiffel Tower is an iron lattice tower located in Paris...

History

wars, inventions, space exploration

Neil Armstrong became the first human to walk on the Moon...

Technology

programming, AI, networking, hardware

Python is a high-level language known for its simple syntax...

 

Each sentence is converted to a vector of 384 numbers. When you search Where is the Eiffel Tower?, your question also becomes a vector, and the database finds the closest matching document.

The good thing: you can swap the knowledge base with anything. PDF notes, a product catalogue, customer support FAQs, diary entries, a recipe book. The system works the same regardless.

4. Demonstration: Semantic Search in Action

Below is actual output from running a python script. The AI model is loaded, 20 documents are converted to vectors, and 6 different search queries are run.

This is our knowledge base put inside (hard coded) a python script. On this knowledge base we will do our semantic search and find relevance in percentages:

knowledge_base

 

        # ── SCIENCE ───

            "Photosynthesis is the process by which plants convert sunlight,

            water, and carbon dioxide into glucose and oxygen.",

            {"category": "science", "topic": "biology"}

      

            "The human body contains approximately 37 trillion cells, "

            "each performing specific functions to keep us alive."

            {"category": "science", "topic": "biology"}

            "DNA (deoxyribonucleic acid) is the molecule that carries the

            genetic instructions for the development of all living organisms."

            {"category": "science", "topic": "biology"}

                

 "Black holes are regions of space where gravity is so strong

            that nothing — not even light — can escape once it passes the event horizon."

            {"category": "science", "topic": "astronomy"}

      

            "The speed of light in a vacuum is approximately 299,792 kilometres

            per second, often denoted as 'c' in physics equations."

            {"category": "science", "topic": "physics"}

      

        # ── GEOGRAPHY ────────

            "The Eiffel Tower is an iron lattice tower located on the Champ de Mars

            in Paris, France. It was constructed in 1889 for the World's Fair.",

            {"category": "geography", "topic": "landmarks"}

      

            "The Amazon River in South America is the largest river in the world

            by water discharge, carrying about 20% of all freshwater that flows into the oceans."

            {"category": "geography", "topic": "rivers"}

       

            "Mount Everest, located in the Himalayas,

            is the tallest mountain on Earth at 8,848.86 meters above sea level."

            {"category": "geography", "topic": "mountains"}

       

            "The Sahara Desert in North Africa is the largest hot desert in the world,

            covering approximately 9 million square kilometers.",

            {"category": "geography", "topic": "deserts"}

       

            "Tokyo is the capital city of Japan and the most populous metropolitan area

            in the world, with over 37 million people in the greater metro region."

            {"category": "geography", "topic": "cities"}

       

        # ── HISTORY

       

            "The Great Wall of China was built over many centuries, primarily during

            the Ming Dynasty (1368–1644), to protect against invasions from the north."

            {"category": "history", "topic": "ancient structures"}

            "World War II lasted from 1939 to 1945 and involved most of the world's nations,

            making it the deadliest and most widespread war in human history."

            {"category": "history", "topic": "wars"}

       

            "Neil Armstrong became the first human to walk on the Moon on July 20, 1969,

            during NASA's Apollo 11 mission."

            {"category": "history", "topic": "space exploration"}

       

            "The printing press was invented by Johannes Gutenberg around 1440 in Germany,

            revolutionizing the spread of knowledge across Europe."

            {"category": "history", "topic": "inventions"}

      

            "The French Revolution, which began in 1789, transformed France from a monarchy

            to a republic and had a lasting influence on modern democracy."

            {"category": "history", "topic": "politics"}

       

        # ── TECHNOLOGY

            "Python is a high-level, general-purpose programming language known for its

           "simple, readable syntax. It was created by Guido van Rossum in 1991."

            {"category": "technology", "topic": "programming"}

     

            "Artificial intelligence (AI) refers to the simulation of human intelligence

            in machines, enabling them to learn, reason, and solve problems.",

            {"category": "technology", "topic": "AI"}

      

            "The internet is a global network of interconnected computers that communicate

            using standardized protocols, enabling information sharing worldwide."

            {"category": "technology", "topic": "networking"}

       

            "A CPU (Central Processing Unit) is the primary component of a computer

 that executes instructions from programs by performing arithmetic and logic operations.",

            {"category": "technology", "topic": "hardware"}

     

            "Machine learning is a subset of AI where systems learn from data to improve

            their performance on tasks without being explicitly programmed for each one."

            {"category": "technology", "topic": "AI"}

      

Loading and Storing 20 Documents

Loading the AI embedding model... (this takes ~5 seconds the first time)

Model loaded successfully!

 

Loading 20 documents into the database...

(This takes ~10-20 seconds while the AI model processes each text)

 

  Converting to vector: 'Photosynthesis is the process...'

  Document #0 added. Total documents: 1

  Converting to vector: 'The human body contains approximately...'

  Document #1 added. Total documents: 2

  ... (18 more documents) ...

  Document #19 added. Total documents: 20

 

Search Results and What They Mean

Query 1: Where is the Eiffel Tower located?

  #1  Score: 79.3%  [geography]

       The Eiffel Tower is an iron lattice tower located on the Champ

       de Mars in Paris, France.

  #2  Score: 23.8%  [history]

       The Great Wall of China was built over many centuries...

  #3  Score: 23.5%  [geography]

       Mount Everest, located in the Himalayas...

 

Result #1 scores 79.3% — a strong, confident match. Results #2 and #3 score around 23% — the database is saying these are the next closest things but they are really not related. The big gap between 79% and 23% shows the database is confident in its top answer.

 

Query 2: How many cells are in the human body?

  #1  Score: 81.4%  [science]

       The human body contains approximately 37 trillion cells...

  #2  Score: 24.9%  [science]

       DNA (deoxyribonucleic acid) is the molecule that carries...

  #3  Score: 19.6%  [geography]

       Tokyo is the capital city of Japan and the most populous

       metropolitan area in the world, with over 37 million people...

 

Notice the Tokyo result. The query mentions 37 and so does the Tokyo fact (37 million people). The AI noticed a weak numerical similarity. At only 19.6% this is a weak match and can be safely ignored — a practical rule of thumb: trust results above 75%, consider results between 40% and 75%, ignore results below 40%.

 

Filtered Search: How do plants make food? (Science only)

  Applying filter {'category': 'science'} -> 5 candidates

  #1  Score: 49.4%  [science]

       Photosynthesis is the process by which plants convert sunlight...

 

This is the SQL-like WHERE clause equivalent. Only science documents were searched, narrowing from 20 to 5 candidates before the similarity comparison ran.

5. Can You Write SQL Queries Like SELECT...WHERE...?

Partially yes, but with an important twist. Vector databases are not replacements for SQL databases. They serve a different purpose.

What You CAN Do: Metadata Filtering

Most vector databases including ours support metadata filtering alongside vector search. This is the closest equivalent to a SQL WHERE clause:

 

# This is equivalent to:

# SELECT * FROM docs WHERE category = 'science'

# ORDER BY similarity DESC LIMIT 3

 

results = db.search(

    query_text      = 'How do plants make food?',

    top_k           = 3,

    filter_metadata = {'category': 'science'}   # the WHERE clause

)

 

What You CANNOT Do

# SQL has no concept of semantic meaning:

SELECT * FROM docs WHERE meaning = 'something about taxes'

# This is impossible in SQL.

# This is EXACTLY what vector search does instead.

You convert your question into an embedding vector, then search for nearest neighbours. The metadata filter (category, date, source) is done in Python as a pre-filter, not in SQL.

6. Peeping Inside: What Does a Vector Actually Look Like?

We wrote a script called peek_at_vectors.py that shows the actual numbers a sentence becomes. Here is real output from that script:

Part 1: The 384 Numbers of One Sentence

Sentence : "The Eiffel Tower is in Paris."

Vector   : 384 numbers (showing first 20)

 

  [  0]  +0.071102  (positive)

  [  1]  +0.037129

  [  3]  -0.010988  (negative)

  [  6]  -0.074099

  [ 13]  -0.101622  (largest negative)

  ...

Smallest: -0.187900    Largest: +0.134785    Average: -0.000225

Why Negative Numbers?

The 384 numbers are not attributes like how French is this sentence? They are abstract mathematical coordinates that the AI learned on its own. Negative means towards the other end of that dimension, just like a city west of a map centre gets a negative longitude. It does not mean wrong or missing.

Think of it like mood on a scale from -10 to +10. Hunger at -9 means not hungry at all, not that hunger is broken. In vectors it is the same idea across 384 dimensions simultaneously.

Part 2: Similar Sentences Produce Similar Vectors

Anchor sentence: "The Eiffel Tower is in Paris."

 

  94.3%  Paris is home to the famous Eiffel Tower.

  36.1%  France is a country in Western Europe.

  -0.9%  Machine learning is a subset of artificial intelligence.

  -0.3%  The Amazon river flows through South America.

The same meaning expressed in different words scores 94.3%. A related topic (France) scores 36.1%. Completely unrelated topics score near 0% or slightly negative. This is cosine similarity working correctly.

Part 3: Two Similar Sentences Side by Side

  Sentence A: "The Eiffel Tower is in Paris."

  Sentence B: "Paris is home to the famous Eiffel Tower."

 

  Index    Vector A     Vector B    Difference

  [  0]  +0.071102  +0.084144    0.013043

  [  1]  +0.037129  +0.048660    0.011532

  [  6]  -0.074099  -0.055071    0.019028

  [  9]  -0.024027  -0.025116    0.001089

 

Numbers are not identical, but follow a similar pattern. This is why cosine similarity correctly identifies them as close.

The numbers are not identical because the word order is different and different words are present. But the pattern of which dimensions are positive and which are negative is similar. Cosine similarity measures this pattern alignment, not the exact values.

7. Step 2: Persistence with SQLite

In Step 1, everything is stored in RAM. When the program stops, RAM is wiped clean and all your data is lost. Persistence means saving data to a file on your hard drive so it survives restarts.

We use SQLite, a lightweight database that lives in a single .db file on your computer. No server needed, no installation, no passwords. Python includes SQLite built-in.

Hybrid Storage Model

Vectors are numpy arrays (lists of 384 floats). SQL does not natively understand numpy arrays, so we serialise them:

         Saving: numpy array to bytes (binary blob) stored in SQLite

         Loading: bytes from SQLite back to numpy array used in Python

 

Column

Type

Contains

id

INTEGER

Unique number for each document

text

TEXT

The original sentence or paragraph

vector

BLOB

The 384 numbers stored as binary bytes

metadata

TEXT

JSON string like {"category": "science"}

timestamp

REAL

When the document was added

Metadata Filtering — the WHERE Clause

The metadata column stores any extra information you choose. When searching you can filter by any metadata field:

db.search(

    query_text      = 'How do plants make food?',

    filter_metadata = {'category': 'science', 'topic': 'biology'}

)

 

# Under the hood this runs:

# for doc in self.documents:

#     if doc['metadata']['category'] == 'science':

#         calculate_similarity(query, doc['vector'])

8. Step 3: Fast Search with HNSW

Brute force search compares your query vector against every single stored vector one by one. With 1 million vectors that is slow. HNSW (Hierarchical Navigable Small World) is a smarter graph-based index.

The Library Analogy

Imagine you are in a huge library looking for books similar to one you like:

         Brute force: you read every single book description. Exhaustive but slow.

         HNSW: the library has a smart card catalogue system with 3 levels:

         Level 2 (top): a few landmark books — big jumps across the library

         Level 1 (middle): more books — medium-range navigation

         Level 0 (bottom): all books — fine-grained comparison with neighbours

 

You start at the top, zoom in roughly, then refine at lower levels. Instead of checking 1,000,000 books you might only check 100 to 200. Accuracy is typically 95 to 99%.

When Do You Need HNSW?

Documents

RAM Used

Recommendation

Under 5,000

~30 MB

Brute force is fast enough. No HNSW needed.

5,000 to 100,000

~600 MB

HNSW starts helping. Optional.

Over 100,000

~1+ GB

HNSW is strongly recommended.

1,000,000+

~6 GB

HNSW is essential.

 

For a toy or learning database, hnswlib is not required. Steps 1 and 2 work perfectly for thousands of documents. hnswlib requires a C++ compiler to build on Windows which can be difficult to install.

9. Step 4: REST API with FastAPI

Right now the vector database is just a Python script you run manually. A REST API turns it into a service that any app, language, or tool can talk to over HTTP.

What is a REST API?

Think of it like a waiter at a restaurant. You (the client app) give your order to the waiter (the API). The waiter takes it to the kitchen (the vector database). The waiter brings back your food (the results).

HTTP uses verbs to describe what you want to do:

         GET: retrieve information (like asking what is on the menu?)

         POST: send data to create or search (like placing an order)

         DELETE: remove something (like cancelling an order)

 

10. Peeping Inside: The Model Weights

The model that converts text to vectors (all-MiniLM-L6-v2) has 22.7 million learned numbers called weights. Here is actual output from our explore_and_idioms.py script showing these weights:

 

Layer : embeddings.word_embeddings.weight

Shape : (30522, 384)  (11,720,448 weights)

Sample values: [-0.0200, -0.0034, -0.0147, 0.0117, -0.0032, ...]

 

Layer : encoder.layer.0.attention.self.query.weight

Shape : (384, 384)  (147,456 weights)

Sample values: [-0.1486, 0.0436, 0.0856, 0.0242, -0.0399, ...]

 

Layer : encoder.layer.0.intermediate.dense.weight

Shape : (1536, 384)  (589,824 weights)

Sample values: [-0.0287, -0.0777, 0.0871, -0.0553, -0.1051, ...]

 

  TOTAL weights in this model: 22,713,216

  (That is 22.7 million numbers the AI learned!)

Where Are They Stored on Your Disk?

C:\Users\xxxx\.cache\huggingface\hub\

  models--sentence-transformers--all-MiniLM-L6-v2\

    snapshots\<hash>\

      pytorch_model.bin   <- the binary weight file

      config.json         <- model configuration

      tokenizer.json      <- vocabulary

Converting Weights to Human-Readable Form

The .bin file is pure binary and cannot be opened directly in Notepad or Excel. A python script can translate it thus:

         weights_summary.json: summary of all layers with statistics.

The contents of the json file will look something like this:

[

  {

    "layer_name": "embeddings.word_embeddings.weight",

    "shape": [

      30522,

      384

    ],

    "total_weights": 11720448,

    "minimum": -0.50732421875,

    "maximum": 1.049399971961975,

    "mean": 1.1778635780501645e-05,

    "std_deviation": 0.055623315274715424,

    "first_10_values": [

      -0.019989013671875,

      -0.0034027099609375,

      -0.014678955078125,

      0.0117034912109375,

      -0.0032482147216796875,

      0.012664794921875,

      0.015899658203125,

      0.007541656494140625,

      -0.0021839141845703125,

      -0.0034465789794921875

    ]

  },

  {

    "layer_name": "embeddings.position_embeddings.weight",

    "shape": [

      512,

      384

    ],

    "total_weights": 196608,

    "minimum": -0.20261888206005096,

    "maximum": 1.8740946054458618,

    "mean": 0.00021643542277161032,

    "std_deviation": 0.024651674553751945,

    "first_10_values": [

      -0.08555416762828827,

      -0.03291567042469978,

      -0.0170280858874321,

      0.10415853559970856,

      0.019368130713701248,

      -0.000353822426404804,

      0.028837377205491066,

      -0.009048297069966793,

      -0.0034039390739053488,

      0.017565004527568817

    ]

  },

  {

    "layer_name": "embeddings.token_type_embeddings.weight",

    "shape": [

      2,

      384

    ],

    "total_weights": 768,

    "minimum": -0.0997314453125,

    "maximum": 0.3629935383796692,

    "mean": 0.0006497848662547767,

    "std_deviation": 0.024186691269278526,

    "first_10_values": [

      0.014616478234529495,

      0.0037614137399941683,

      -0.012041018344461918,

      -0.017752939835190773,

      0.004878282081335783,

      0.013683986850082874,

      -0.015841402113437653,

      0.0014262089971452951,

      -0.014348766766488552,

      0.033464301377534866

    ]

  },

  And so on…

 

         one_layer_full.csv: every single weight value of one small layer (384 rows). Can be piped and opened in Excel.

The contents would be like this:

index

weight_value

0

0.169540539

1

0.111124329

2

0.006444646

3

0.021633515

4

0.042796902

5

-0.03980178

6

-0.029967835

7

0.039178368

8

0.016793983

And so on, till 383, that is, including index 0.

Editing weights is possible in Python, but changing them randomly breaks the model. Deliberately changing thousands of weights in a mathematically guided way is called fine-tuning, a whole field of AI research.

11. What Are the Six Transformer Layers?

The model which we are using for this tutorial, called all-MiniLM-L6-v2 contains L6 meaning 6 transformer layers. Each layer is a stack of mathematical operations that progressively refines the meaning of the sentence.

The Multi-Floor Building Analogy

Think of the model as a six-floor building that a sentence travels through from bottom to top:

 

Layer

Name

What it does

Analogy

0

Embedding

Converts each word to a starting vector by looking it up in a table of 30,522 words

The lobby: every visitor gets a name badge

1

Transformer 1

Notices word order and basic grammar patterns

Floor 1: sort visitors by department

2

Transformer 2

Nearby words start influencing each other's meaning

Floor 2: departments talk to neighbours

3

Transformer 3

Subject, verb, object relationships become clear

Floor 3: who did what to whom

4

Transformer 4

Meaning of each word shaped by the full sentence

Floor 4: everyone reads the full memo

5

Transformer 5

High-level meaning structures emerge

Floor 5: teams form around concepts

6

Transformer 6

Richest, most context-aware final representation

Floor 6: final decision, exit as one idea

The Three Sub-Layers Inside Each Transformer Layer

Inside every transformer layer, three operations happen in sequence:

Attention (Query, Key, Value matrices)

Every word looks at every other word and asks: how much should I pay attention to you?

For "The Eiffel Tower is in Paris":

 

  Tower  -> pays HIGH attention to Eiffel  (they belong together)

  Tower  -> pays LOW attention to is       (grammatical filler)

  Paris  -> pays HIGH attention to in      (location relationship)

Feed Forward (expansion then compression)

After attention, each word vector expands from 384 to 1,536 dimensions to think more, then compresses back to 384. This is the processing step.

Layer Normalisation

Rescales values so the mean stays near 0 and standard deviation near 1. Without this, values would explode (become huge) or vanish (become near zero) after passing through many layers.

Why 6 Layers?

Model

Layers

Weights

File Size

our model (all-MiniLM-L6-v2)

6

22 million

80 MB

BERT-base

12

110 million

440 MB

GPT-3

96

175 billion

700 GB

 

More layers mean more capacity to understand nuance, longer context, subtler meaning, but also more memory, computation, and electricity. Our 6-layer model is deliberately small and fast, good enough for sentence similarity and light enough to run on a laptop.

12. Semantic Search on English and Hindi Idioms

We tested semantic search on two types of idiom databases. The results revealed important lessons about what the model does and does not understand.

English Idioms: Search by Word

Query: "stitch"

  #1  63.5%  A stitch in time saves nine

  #2  26.3%  Bite the bullet

  #3  22.1%  Every cloud has a silver lining

 

Query: "rolling stone"

  #1  63.8%  A rolling stone gathers no moss

  #2  23.7%  Don't judge a book by its cover

Searching by a word that appears in the idiom works well because the model has seen those words in training and associates them with the idiom.

English Idioms: Search by Meaning

Query: "don't worry about things that already happened"

  #1  29.1%  Don't cry over spilled milk

             Meaning: Don't waste time worrying about past mistakes

 

Query: "she looks rough but is actually a kind person"

  #1  16.2%  Don't judge a book by its cover

The scores are lower (16% to 30%) when searching by meaning rather than by words. The model finds the right idiom but less confidently. This is because idioms are deliberately indirect, their literal words do not match their meaning.

Hindi Idioms: English Model vs Multilingual Model

[English model on Hindi idioms]

Query: "actions have consequences"

  #1  14.7%  Aam ke aam, gutliyon ke daam  <- WRONG

             Meaning: Getting double benefit from one thing

 

[Multilingual model on Hindi idioms]

Query: "actions have consequences"

  #1  33.0%  Sau sunaar ke, ek lohaar ka  <- BETTER

             Meaning: One strong decisive action is worth a hundred weak ones

The English model sees Hindi words as unfamiliar character sequences and groups them by character patterns rather than meaning. The multilingual model (paraphrase-multilingual-MiniLM-L12-v2, 470 MB download) was trained on 50+ languages including Hindi and produces meaningfully better results. The model choice matters as much as the database design.

13. Security and Privacy for a Banking Deployment

For a real banking application, the model weight files and customer data need protection. Here is the complete picture:

Encrypting the Model Weight Binary Files

The .bin file can be encrypted. However, raw RSA cannot encrypt large files directly:

         RSA is designed for small data: maximum 245 bytes per operation at 2048-bit key strength

         A model weight file is 80 MB. RSA alone would take hours to encrypt it

 

The correct approach is Hybrid Encryption:

Step 1: Generate a random AES-256 key (just 32 bytes)

Step 2: Encrypt the MODEL FILE with AES-256  <- fast, handles large files

Step 3: Encrypt the AES KEY with RSA          <- small, RSA handles this perfectly

Step 4: Store encrypted file + encrypted key together

 

At runtime:

Step 1: Decrypt the AES key using RSA private key  <- milliseconds

Step 2: Decrypt the model file using AES key        <- seconds

Step 3: Load weights into RAM, wipe the decrypted key

The RAM Exposure Problem

Encryption on disk does not protect data once it is loaded into RAM for inference. The decrypted weights sit in memory and are readable by anyone with OS-level access. Banks address this with:

         Hardware Security Modules (HSM): a physical tamper-proof chip. The decryption key never leaves the chip.

         Trusted Execution Environments (TEE): Intel SGX or AMD SEV create encrypted RAM enclaves that even the OS cannot read

         Confidential Computing: cloud providers offer VMs where even the hypervisor cannot read memory

Customer Data Protection

Concern

Solution

Database file on disk

SQLCipher: a drop-in SQLite replacement with AES-256 encryption

Data in transit

HTTPS/TLS on the FastAPI server. Never plain HTTP.

Access control

OAuth2 tokens or mTLS certificates on the /search endpoint

Audit trail

Log every query, result, timestamp, and requester identity

Data minimisation

Store only vectors if raw text is not needed

 

This is visualisation of a text string’s journey through all the 6 layers:

This is the travel of the first 20 values:

 

End of Tutorial


No comments:

Ineresting? ShareThis

search engine marketing