Download - Qwen Ai Chat

Qwen (通义千问) is a family of large language models developed by Alibaba Cloud. The name “Tongyi Qianwen” roughly means “Truth from a Thousand Questions”, reflecting these models’ aim to answer diverse queries accurately. The Qwen series includes multiple model sizes – from 1.8 billion to 72 billion parameters – catering to use cases ranging from research prototypes to enterprise applications. All Qwen models are Transformer-based and trained on massive multilingual data (with a focus on Chinese and English), including web text, books, code, and more.

Alibaba has open-sourced several Qwen models under official licenses, making them available for developers to download and use. These include the base language models Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B, as well as their instruction-tuned Qwen-Chat counterparts. In addition, a vision-language model Qwen-VL (built upon the 7B model) extends Qwen’s capabilities to images and multimodal tasks. Each chat variant (e.g. Qwen-7B-Chat) is aligned with human preferences via fine-tuning (SFT/RLHF) so it can engage in helpful dialogue, follow instructions, assist with coding, math, translation, and more.

Below is a breakdown of all official Qwen AI models, with brief descriptions and direct download links from trusted sources like Hugging Face. Each link uses descriptive anchor text (for SEO) such as “Download Qwen-7B from Hugging Face.” Be sure to review the model’s license and requirements before use.

Official Qwen Model Lineup and Download Links

Qwen-1.8B (18 Billion Parameters)

Description: Qwen-1.8B is a lightweight 1.8B-parameter LLM offering a low-cost entry point into the Qwen family. Despite its smaller size, it was pretrained on an extensive 2.2 trillion token corpus (multi-language text, code, etc.). It supports up to an 8K context window and uses an expanded 150K vocabulary to handle many languages. Qwen-1.8B is ideal for resource-constrained environments – with 4-bit or 8-bit quantization, it requires under 2–3 GB of GPU memory for inference. An instruction-tuned assistant variant, Qwen-1.8B-Chat, is available for conversational AI tasks.

Download:

Download Qwen-1.8B from Hugging Face (official model repository)
Download Qwen-1.8B-Chat from Hugging Face (chat fine-tuned variant)

License: Qwen-1.8B is released under the Tongyi Qianwen Research License, allowing research use freely but requiring permission for commercial use.

Qwen-7B (7 Billion Parameters)

Description: Qwen-7B is a 7B-parameter LLM and one of the flagship models of Alibaba’s Tongyi Qianwen series. It was trained on 2.4 trillion tokens of diverse data (Chinese, English, code, math, etc.), enabling strong performance on both English and Chinese tasks. Qwen-7B introduced a 32K token context window (suitable for long documents) and an extensive 150K token vocabulary for robust multilingual support. In evaluations, Qwen-7B outperforms other open-source models of similar size on reasoning, coding, commonsense, and more. A chat-enhanced version Qwen-7B-Chat is provided as a ready-to-use AI assistant aligned for dialogue (able to converse, answer questions, generate content, and use tools). This model is well-suited for chatbots, assistants, and general-purpose language tasks.

Download:

License: Qwen-7B is open-source under Alibaba’s Tongyi Qianwen License Agreement, which permits research and modification. Commercial use requires application/approval under this license. It is compatible with popular frameworks: for example, you can load Qwen-7B in Hugging Face Transformers (PyTorch) with AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) as demonstrated in the model README.

Qwen-14B (14 Billion Parameters)

Description: Qwen-14B is a 14-billion parameter model offering greater capability and accuracy than its smaller counterparts. Pretrained on 3.0 trillion tokens of data (text, code, etc.), Qwen-14B achieves state-of-the-art performance for its size on many Chinese and English benchmarks, often matching or surpassing larger 30B+ models. It maintains the same expanded vocabulary (~150k tokens) for strong multilingual support. Qwen-14B uses an 8K context window in the initial release, sufficient for most dialogue and document tasks. Like other Qwen models, it has an aligned chat version: Qwen-14B-Chat, fine-tuned with human feedback to act as a helpful assistant. Qwen-14B (and its chat model) are well-suited for demanding NLP tasks requiring a mid-sized model – providing a balance between Qwen-7B’s efficiency and Qwen-72B’s performance.

Download:

License: Qwen-14B shares the same Tongyi Qianwen License as Qwen-7B. Non-commercial use is allowed freely, while commercial deployments require contacting Alibaba for authorization. Technically, Qwen-14B integrates with the PyTorch/Transformers ecosystem. Developers have successfully loaded it via the Hugging Face Transformers API after installing the Qwen custom code support (the model uses trust_remote_code=True for custom architecture components).

Qwen-72B (72 Billion Parameters)

Description: Qwen-72B is the largest open model in the Qwen lineup, boasting 72 billion parameters. It has been pretrained on ~3 trillion tokens of data, similar to Qwen-14B, but with a much larger model capacity for enhanced accuracy and knowledge. Qwen-72B includes support for 32k context length, enabling very long inputs/outputs (ideal for long document analysis and extended conversations). It also retains the 150k vocabulary for multi-language proficiency. On benchmarks, Qwen-72B achieves leading results among open models, excelling in reasoning, coding, math, and complex Q&A tasks. For usage, note that its size demands significant hardware: the authors recommend 2×80GB A100 GPUs (or equivalent) for FP16 inference, and at least ~48 GB GPU memory when using 4-bit quantization. A fine-tuned chat model Qwen-72B-Chat is available, which can engage in extended dialogues and complex instructions with high performance. Qwen-72B is targeted at advanced applications that need the extra power of a 70B+ model, such as research, enterprise analytics, or building top-tier chatbots.

Download:

License: Qwen-72B is released under Alibaba’s Tongyi Qianwen License (commercial use requires approval) similar to the 7B/14B models. The code and model integration are Apache-2.0, so developers can freely use the provided tools and Transformers integration. Given the model’s scale, distributed or low-precision inference frameworks (like DeepSpeed or vLLM) are recommended for practical deployment. The Qwen team’s documentation provides guidance on using Qwen-72B with techniques like GPTQ quantization and memory-optimized inference.

Qwen-VL (Vision-Language Multimodal Model)

Description: Qwen-VL extends Qwen’s capabilities beyond text, incorporating vision-language understanding. Built upon the Qwen-7B base model, Qwen-VL adds a visual encoder and adapter to process image inputs alongside text. This enables it to perform tasks like image captioning, visual question answering (VQA), object detection, OCR text reading from images, and even video analysis in certain versions. Qwen-VL is multilingual and can handle inputs of multiple images interleaved with text, providing fine-grained visual understanding and multi-image comparison abilities. The base Qwen-VL model produces textual outputs (and bounding boxes for localization tasks) given image+text prompts. It has an instruction-tuned variant Qwen-VL-Chat, optimized for image-based dialogues, where the model can discuss images conversationally, describe content, answer questions about an image, and follow mixed image-text instructions. Qwen-VL and Qwen-VL-Chat have achieved state-of-the-art results on open vision-language benchmarks (e.g. ranking #1 on the OpenVLM leaderboard) and are considered competitive with proprietary models like GPT-4V. These models unlock use cases such as AI assistants that can see and understand images (for example, analyzing diagrams, screenshots, or photographs as part of a conversation).

Download:

License: Qwen-VL models are open-sourced under the Tongyi Qianwen License Agreement (similar terms as other Qwen models). They are provided on both Hugging Face and Alibaba ModelScope. Developers can load Qwen-VL using Transformers with a special tokenizer that supports image inputs (the model card provides examples). Both base and chat versions require PyTorch and support inference on GPUs with appropriate memory (for example, Qwen-VL-Chat 7B in int8 can run on a single high-end GPU).

Qwen for Code (Code-Specific Models)

All Qwen-Chat models are capable of programming assistance – they were trained on code data and can write or analyze code as part of their general capabilities. However, Alibaba has also released specialized code-focused Qwen models in the newer Qwen2.5 series. Known as Qwen2.5-Coder, these models range from 0.5B up to 32B parameters and are fine-tuned specifically for coding tasks. Notably, Qwen2.5-Coder-32B is reported to achieve code generation abilities on par with GPT-4 on certain benchmarks. The Qwen2.5-Coder models (including a 7B variant) are open-source under the Apache-2.0 license, making them free for commercial use. If your primary interest is code completion, debugging, or building an AI coding assistant, you may consider using these Qwen coder models. For example, you can Download Qwen2.5-Coder-7B-Instruct from Hugging Face for a 7B code model. These coder models integrate with Hugging Face Transformers (ensure transformers>=4.37 to avoid tokenization errors for “qwen2” model IDs).

Frequently Asked Questions (FAQ)

What are the system requirements to run Qwen models?

The requirements depend on the model size. All Qwen models require Python 3.8+ and PyTorch 1.12+ (PyTorch 2.0 recommended) for the provided code and Transformer integration. A GPU with CUDA 11.4+ is highly recommended for faster inference (and necessary for larger models). In terms of memory: smaller models are quite lightweight, while larger ones need high-end hardware:
Qwen-1.8B: Can run on consumer GPUs – inference can be done in 8-bit or 4-bit quantized mode using <2 GB of GPU VRAM (and about 3 GB for generating ~2048 tokens). It’s feasible to run on a modern laptop or CPU-only (with reduced speed) for experimentation. Fine-tuning Qwen-1.8B (e.g. with LoRA) requires ~6 GB GPU memory.
Qwen-7B: Typically needs around 8–16 GB GPU memory. For example, generating 2048 tokens with Qwen-7B in 4-bit mode uses ~8.2 GB VRAM. Full FP16 inference might require ~16 GB. It can be loaded on a single NVIDIA 3090/4090 or A6000.
Qwen-14B: Requires roughly 14–20 GB GPU memory. Int4 inference uses ~13 GB for 2K tokens. Plan for at least a 24 GB GPU for comfortable use (e.g. a RTX 6000 Ada or dual smaller GPUs with sharding).
Qwen-72B: This model is resource-intensive. For FP16/bfloat16 inference, ~144 GB of GPU memory is needed (e.g. 2×80GB GPUs). Running it in 4-bit quantized mode still requires ≈48 GB of VRAM. Multi-GPU setups (such as 8×A100 40GB) or GPU cloud instances are recommended. Qwen-72B also uses significantly more CPU RAM for the model weights (over 140 GB), so ensure your system has sufficient memory if offloading.
In summary, GPU with high memory is recommended for Qwen-14B and above. For Qwen-7B and smaller, one consumer-grade GPU is usually enough, and for Qwen-1.8B even CPU usage is possible (albeit slow). Always install the appropriate dependencies: transformers, accelerate, einops, torch, etc., as listed in the model docs. Using optimized inference libraries like Hugging Face Accelerate or DeepSpeed can help run larger Qwen models by sharding across GPUs.

How can I perform inference with Qwen models? Which frameworks are supported?

Qwen models are fully compatible with Hugging Face Transformers, making it straightforward to load them in PyTorch. The official model repositories provide example code. For instance, to load Qwen-7B or Qwen-7B-Chat:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
The trust_remote_code=True flag is required because Qwen uses some custom modeling code (for the rotary embeddings, etc.) which Transformers will fetch from the Qwen repository. Once loaded, you can use standard generation methods (model.generate or the provided model.chat interface for chat models).
Beyond plain Transformers, Qwen models can be deployed with multiple inference frameworks:
ModelScope: Alibaba provides ModelScope support – Qwen models are available on ModelScope Hub and can be loaded via ModelScope APIs.
vLLM: For optimized text generation with continuous batching, vLLM supports Qwen (the Qwen team even provides guides for using vLLM with Qwen models for high-throughput inference).
DeepSpeed & Accelerate: These can be used to shard large Qwen models across GPUs or use CPU offloading. Qwen’s documentation includes guidance on using DeepSpeed for faster inference and lower memory usage.
Quantized Runtimes: Qwen-7B and 14B have been integrated into GGML/GGUF format (e.g., via projects like qwen.cpp similar to llama.cpp). The Qwen team released a tool called qwen.cpp for running Qwen-7B on CPU with quantization. Community efforts like Ollama also package Qwen variants (e.g., ollama run qwen:7b).
In summary, you can use Qwen with any framework that supports HuggingFace Transformers models. The primary reference implementation is PyTorch-based. Make sure to use the official Hugging Face model checkpoints (or ModelScope copies) to ensure compatibility.

What license and usage terms apply to Qwen models?

The licensing for Qwen models varies by model and version:
Qwen-7B, 14B, 72B (Base & Chat) – Released under the Tongyi Qianwen License Agreement. This license allows free use, distribution, and modification of the model for research or non-commercial purposes. Commercial use requires contacting Alibaba for permission or an application process. In practice, this means you can experiment and build applications with these weights freely, but if you intend to monetize or deploy commercially, you need to obtain approval from Alibaba Cloud. This license was created by Alibaba specifically for these model releases. It is somewhat similar to licenses like LLaMA’s community license, in that it restricts unauthorized commercial use. The full text is available in the model repo’s “License Agreement” section.
Qwen-1.8B (Base & Chat) – Released under a slightly different Tongyi Qianwen Research License. This is even more restrictive for commercial usage: it is intended for research and non-commercial use only, and any commercial application explicitly requires written permission or a separate agreement with Alibaba. Essentially, treat Qwen-1.8B as research-only unless you’ve made arrangements with Alibaba Cloud. For academic and hobbyist use, it is free.
Qwen-VL and Qwen-VL-Chat – Use the Tongyi Qianwen License Agreement as well (the same terms as the text models: non-commercial by default). At the time of release, Qwen-VL was open-sourced for research with the expectation that commercial uses would go through Alibaba’s approval. Always double-check the specific model card’s “License” field; for example, Hugging Face lists Qwen-VL-Chat’s license as tongyi-qianwen-license (which signals the non-commercial clause).
Qwen2.5 Series (Omni, VL-32B, Coder, etc.) – Many of these newer models are under the Apache-2.0 License. Notably, Qwen2.5-Omni-7B and Qwen2.5-Coder models are Apache-2.0, which permits commercial use out-of-the-box. Alibaba shifted to Apache license for certain 2025 releases to encourage broader adoption. Always read the model card: for example, Qwen2.5-Coder-7B is clearly marked “License: apache-2.0” on Hugging Face. If you require a Qwen model for commercial products without a special agreement, consider using those Apache-licensed versions.
In summary, for the original Qwen models (1.8B, 7B, 14B, 72B, VL), you can freely use them for research, open-source projects, or internal R&D. But if you plan to integrate them into a paid service or product, you must seek permission from Alibaba (as per the Tongyi Qianwen license). The Qwen code (repository code, not weights) is under Apache-2.0, so any scripts or libraries provided are free to use in any context. It’s the model weights that carry the special license. When in doubt, consult the official Qwen GitHub and documentation for the latest licensing FAQs – and when needed, reach out to Alibaba Cloud for clarification or to obtain a commercial license.

What are common use cases for Qwen models?

Qwen models were designed as general-purpose AI assistants and language models with strengths in both Chinese and English. Some common use cases:
Chatbots and Virtual Assistants: Qwen-Chat models (e.g. Qwen-7B-Chat, Qwen-14B-Chat) excel at conversational dialogue, answering questions, and following user instructions in a chat format. They can be integrated into chat applications to provide automated customer support, tutoring, or personal assistant services (with the caveat of the license for commercial deployments).
Content Generation: These models can generate coherent text, making them useful for drafting emails, writing articles, summarizing documents, translating text, or creating creative content on demand. For example, Qwen has demonstrated strong performance in summarization and translation tasks during evaluation.
Coding Assistance: Qwen models were trained on a large volume of code (especially the chat versions and Qwen2.5-Coder), so they can write code, explain code, and help with debugging. Developers can use Qwen-7B-Chat or the specialized Qwen2.5-Coder models as AI pair programmers to suggest code snippets or algorithms in multiple programming languages.
Multimodal Applications: With Qwen-VL and Qwen-VL-Chat, you can build applications that require understanding images along with text. Use cases include an AI that can describe uploaded images, answer questions about an image’s content (visual Q&A), perform OCR on images, or help users analyze diagrams/screenshots. For example, Qwen-VL-Chat can take an image plus a question like “What is happening in this picture?” and produce a detailed answer. This opens possibilities for assistive tech, image search engines, or automating form data extraction.
Research and Benchmarking: Because Qwen models are open and powerful, researchers use them to study large LLM behavior, fine-tune them on domain-specific data, or benchmark them against other models. Qwen-72B in particular, being one of the largest open models, is used as a reference for cutting-edge LLM performance in academia.
Keep in mind that while Qwen models are high-performing, they are not immune to typical LLM limitations (e.g., potential to produce incorrect or nonsensical answers, sensitivity to prompt wording, etc.). Always evaluate a model’s outputs for your specific use case. If deploying publicly, implement appropriate filters or human review for sensitive tasks.

How do I cite or credit Qwen if I use it in my project or paper?

If you use Qwen models in a research project or application, it’s good practice to acknowledge the source. Alibaba has released an official technical report for Qwen (see arXiv:2309.16609) which you can cite in academic papers. In the Qwen model cards, they provide a citation BibTeX entry. For example, the Qwen-7B model card under Citation gives a reference format. Typically, you can cite it as:
Bai et al., “Qwen: A Family of Open Large Language Models (Tongyi Qianwen)”, Alibaba Cloud, 2023.
In open-source projects or applications, you can credit “Qwen by Alibaba Cloud” and link to the official Qwen repository or model card. Also, if you’re using the model under the official license, ensure you include any required attribution or text from the license agreement. For instance, the Tongyi Qianwen License might ask for including a copy of the license and an acknowledgment that the model came from Alibaba Cloud’s Qwen.
Additionally, if you fork the code or weights, maintain the provenance (don’t remove the original license files). On Hugging Face, the model pages (such as Qwen-7B’s card) include the license and contact info. Following those guidelines will ensure you’re respecting the creators’ terms while using this open model.

For more detailed information, visit the official Qwen GitHub repository, read the technical report, or see Alibaba Cloud’s documentation. The Qwen community (see the Discord link on model cards) is active and can help with any troubleshooting or advanced use questions. Happy building with Qwen AI models!