WeClone: Fine-Tune an LLM on Your Chat History and Build an AI Digital Twin

By Prahlad Menon Published 2026-04-13 5 min read

Someone built a tool that reads your chat history, fine-tunes a language model on thousands of your actual conversations, and deploys a chatbot that talks exactly like you. Your slang. Your humor. Your tone. Your response patterns.

It’s called WeClone. It has 16,400 GitHub stars. It’s completely free and self-hosted. And it works.

Here’s what it actually does — and what it means.

What WeClone Is (and Isn’t)

WeClone is not a chatbot with your name on it. It’s not a persona you describe in a system prompt. It’s an end-to-end pipeline that:

Exports your real chat history (Telegram, WeChat, Discord)
Cleans, filters, and preprocesses that data automatically
Fine-tunes an open-source LLM — by default, Qwen2.5-VL-7B — using LoRA
Deploys the trained model as a live chatbot on Telegram, Discord, or Slack

The key distinction: the model learns from what you actually said, not from a description of how you speak. It reads your jokes. Your arguments. Your reactions to news, to bad news, to your friends being annoying. It learns your vocabulary distribution, your punctuation habits, your characteristic phrases.

After training, people can text “you.” And they often can’t tell the difference.

How the Pipeline Works

Step 1: Export Your Chat History

WeClone supports Telegram natively (WhatsApp and Discord are in active development). From Telegram Desktop: open any chat → top-right menu → Export Chat History → select Photos and JSON format.

Export multiple contacts into the ./dataset/telegram/ directory. Each contact’s ChatExport_* folder goes in together. Group chats are not recommended — the model learns your specific voice better from 1:1 conversations.

Step 2: Automatic Data Processing

WeClone handles the messy parts: stripping metadata, filtering private information (phone numbers, addresses, credentials), cleaning formatting artifacts, and converting your raw JSON exports into structured training pairs.

The pipeline is configured via a single settings.jsonc file. You copy the template, point it at your data, and the preprocessing runs automatically.

Step 3: Fine-Tuning with LoRA

This is where the real work happens. WeClone uses LLaMA Factory under the hood for the supervised fine-tuning (SFT) stage. The default model is Qwen2.5-VL-7B-Instruct with LoRA — a parameter-efficient training method that modifies only a small subset of model weights, dramatically reducing VRAM requirements.

Honest VRAM requirements (LoRA, 16-bit precision):

Model Size	VRAM Needed
7B	16 GB
14B	32 GB
30B	64 GB
70B	160 GB

If you’re running QLoRA at 4-bit quantization, those numbers roughly halve — a 7B model drops to ~6 GB, putting it within reach of a consumer RTX 3080 or 4070.

One honest caveat from their own docs: the 7B model performance is average. Models at 14B or above tend to deliver noticeably better results. The more data you have, and the larger the model, the more convincing the twin.

Step 4: Deployment

Once trained, the model binds to a chatbot interface. Telegram deployment is fully supported. Discord and Slack work via integration. WhatsApp support is in progress.

Your digital twin goes live. People can message it. It responds as you.

What Makes It Actually Work

The key insight is that personality is statistical. How you respond isn’t random — it’s a distribution over word choices, sentence lengths, reaction types, humor registers. Fine-tuning a sufficiently large model on enough of your conversations shifts its distribution toward yours.

It also supports image modal data — training on photos you sent, not just text. This is genuinely novel for an open-source tool of this type.

Privacy filtering runs locally before training begins, and the entire pipeline is self-hosted. Your conversations never leave your machine.

Where It Falls Short

A few honest limits worth knowing:

Data volume matters a lot. If you have only a few hundred messages per contact, the training signal will be thin. Thousands of messages per person produce meaningfully better results.
7B models underperform. Plan for 14B+ if you want results that hold up under pressure. That means a 32 GB VRAM GPU or a cloud GPU instance.
Windows is not rigorously tested. Use WSL2 if you’re on Windows.
Voice cloning is referenced but not yet integrated in the current open-source release.
Group chat data degrades quality — the model learns a blended voice, not yours specifically.

The Deeper Question: Digital Legacy or Identity Risk?

WeClone’s homepage uses the phrase “digital immortality.” Your grandchildren could have conversations with an AI that learned from thousands of real exchanges you had in your twenties and thirties. Your patterns of reasoning. Your sense of humor. How you responded when someone you loved was struggling.

That’s genuinely moving.

It also raises a question no one has cleanly answered: what happens when someone builds a WeClone of you without your knowledge? The tool works on anyone’s exported chat history. If someone in your contact list exports your 1:1 thread with them, they have everything they need.

This isn’t a hypothetical. It’s a $0 operation that takes about an afternoon.

The technology is neutral. The use case spectrum runs from “preserving a grandparent’s voice for future generations” to “impersonating someone to manipulate their contacts.” WeClone adds a privacy filter and a note about ethical use. The rest is on you.

Getting Started

git clone https://github.com/xming521/WeClone.git && cd WeClone
uv venv .venv --python=3.12
source .venv/bin/activate
uv pip install --group main -e .
cp examples/tg.template.jsonc settings.jsonc

From there: export your Telegram data, drop it in ./dataset/telegram/, configure settings.jsonc, and run the training pipeline. Full documentation is at docs.weclone.love.

Requirements checklist:

CUDA 12.6+ (or Apple Silicon with MPS — untested)
Python 3.12
16+ GB VRAM for a useful 7B run; 32 GB for 14B
A few thousand messages of chat data per contact
Optional: FlashAttention for faster training

The Signal in the Stars

16,400 GitHub stars in this space is meaningful. The project isn’t riding a viral moment — it has 422 commits and 1,300+ forks. People are actually building with it.

What WeClone represents isn’t just a cool demo. It’s the leading edge of a shift in how we think about identity, memory, and who gets to speak after we’re gone — or after we’re just busy.

That question is worth taking seriously. The tool is worth watching.

Repo: github.com/xming521/WeClone
License: AGPL-3.0
Docs: docs.weclone.love

Frequently Asked Questions

What is WeClone and how does it work?
WeClone is an open-source tool that fine-tunes a large language model on your personal chat history — from Telegram, WeChat, or other messaging apps — to create a conversational AI twin that mimics your writing style, tone, and personality. The pipeline covers data export, preprocessing, LoRA fine-tuning, and chatbot deployment.

How much GPU memory do I need to run WeClone?
Running WeClone with a 7B parameter model using LoRA at 16-bit precision requires approximately 16 GB of VRAM. QLoRA at 4-bit quantization reduces this to about 6 GB, making it compatible with consumer GPUs like the RTX 3080 or 4070. For better results, a 14B model requires 32 GB VRAM.

What chat platforms does WeClone support for training data?
WeClone currently supports Telegram (fully) and WeChat (via a personal account integration). WhatsApp, Discord, and Slack support are under active development. Telegram exports using the Desktop app’s JSON export feature.

Is WeClone private and secure? Does my data leave my machine?
WeClone is fully self-hosted — your chat data never leaves your machine. It includes a built-in privacy information filter that strips sensitive data (phone numbers, addresses, credentials) before training begins. Fine-tuning runs locally on your hardware.

How much chat data do I need for WeClone to work well?
More data produces better results. A few thousand messages per contact is a reasonable minimum. The model also benefits from diverse conversations across multiple contacts. Thin datasets (a few hundred messages) will produce a less convincing twin. The WeClone team notes that model size matters as much as data volume — 14B+ parameter models significantly outperform 7B.

Can someone use WeClone to clone my personality without your consent?
Technically, yes — anyone with access to a chat export containing your messages could use WeClone to train a model on your conversational patterns. This is an emerging identity and consent issue the tool’s ethical guidelines acknowledge but cannot technically prevent. It’s worth being aware of when considering what chat platforms to use and with whom.

How is WeClone different from just using a chatbot with a custom system prompt?
A system prompt tells a model to “act like you” based on a description. WeClone trains the model on thousands of your actual messages, shifting its statistical distribution toward your real vocabulary, humor, sentence structure, and response patterns. The result is a model that has internalized your conversational behavior rather than approximating it from instructions.

What model does WeClone use by default?
WeClone defaults to Qwen2.5-VL-7B-Instruct, a multimodal model from Alibaba’s Qwen team. It supports other models via LLaMA Factory, including larger variants up to 70B. The VL (Vision-Language) architecture also enables training on image data you’ve shared in chats.