Qwen으로 시작하는 나만의 AI 비서 만들기

카테고리 없음

Qwen으로 시작하는 나만의 AI 비서 만들기

아주까리동백기름 2025. 5. 15. 11:43

소개

Qwen 언어 모델 시리즈는 다양한 자연어 처리 작업에 활용할 수 있는 강력한 오픈소스 LLM을 제공합니다.

이 글에서는 Qwen1.5-7B-Chat 모델을 기반으로 한 개인 비서 애플리케이션을 Python 환경에서 설정하고 실행하는 방법을 안내합니다.
이 모델은 70억 개 파라미터를 가진 비교적 가벼운 챗봇 특화 모델로, 대화형 용도에 최적화되어 있습니다.

예제 코드는 Google Colab과 같은 Python 노트북 환경에서 바로 실행할 수 있도록 구성되어 있으며,
원한다면 로컬 환경으로도 쉽게 이식하여 사용할 수 있습니다.

코딩 구현

Qwen 기반의 개인 비서를 구축하려면 여러 가지 의존성 패키지와 라이브러리 설치가 필요합니다.
따라서 가장 먼저 해야 할 일은 이들 라이브러리를 설치하고, 이미 설치되어 있을 수 있는 버전들과의 호환성을 최대한 확보하기 위해 버전 정보를 확인하는 것입니다.

pip install -q transformers accelerate bitsandbytes einops ipywidgets

초기 설정: GPU 활용 및 의존성 확인

모델 추론 속도를 높이기 위해, 사용 가능한 경우 GPU(CUDA) 를 자동으로 감지해 우선적으로 할당합니다.
이는 모델이 처음 호출되는 시점부터 적용되며, 특히 Colab이나 로컬 GPU 환경에서 성능 향상에 매우 효과적입니다.

또한, Qwen 모델을 실행하기 위해 필요한 필수 패키지 설치 및 버전 확인 과정도 함께 진행합니다.

# 기본 라이브러리 임포트
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
import sys
import os

# bitsandbytes 설치 확인 및 자동 설치
try:
    import bitsandbytes as bnb
    print("Successfully imported bitsandbytes")
except ImportError:
    print("Error importing bitsandbytes. Attempting to install again...")
    !pip install -q bitsandbytes --upgrade
    import bitsandbytes as bnb

# (이미 설치되어 있다면 생략 가능) 필수 패키지 설치
!pip install -q transformers accelerate bitsandbytes einops ipywidgets

# 디바이스 설정: GPU(CUDA) 우선 적용, 없으면 CPU 사용
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

모델 불러오기 및 구성

이제 모델을 불러오고 설정할 차례입니다.

이번에 사용할 모델은 Qwen/Qwen1.5-7B-Chat 으로,
같은 계열의 고성능 모델인 Qwen2.5-Omni보다 가볍고 초기 추론 속도가 빠른 장점이 있습니다.
Omni 모델은 막강한 성능을 자랑하지만, 리소스 측면에서 조금 더 무거운 편입니다.

# Qwen1.5-7B-Chat 모델 로딩 - Colab T4 GPU에서 실행 가능
model_name = "Qwen/Qwen1.5-7B-Chat"

print(f"Loading {model_name}...")
start_time = time.time()

# 1. 토크나이저 로딩
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# 2. 모델 로딩 시도 - 4bit → 실패 시 8bit → 실패 시 일반 로딩
try:
    print("Attempting to load model with 4-bit quantization...")
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,  # 성능 향상을 위한 bfloat16 사용
        device_map="auto",
        trust_remote_code=True,
        quantization_config={"load_in_4bit": True}  # 4-bit 양자화 시도
    )
except Exception as e:
    print(f"4-bit quantization failed with error: {str(e)}")
    print("Falling back to 8-bit quantization...")
    try:
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
            load_in_8bit=True  # 8-bit로 대체 시도
        )
    except Exception as e2:
        print(f"8-bit quantization failed with error: {str(e2)}")
        print("Falling back to standard loading (will use more memory)...")
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True
        )

# 3. 로딩 시간 출력
load_time = time.time() - start_time
print(f"Model loaded in {load_time:.2f} seconds")

기본 프롬프트 설정하기

자신만의 대화형 어시스턴트를 만들 때는,
각 요청에 공통적으로 포함되는 기본 프롬프트(default prompt) 를 설정해
모델의 응답 스타일과 행동을 일관되게 조정하는 것이 좋은 관행입니다.

기본 프롬프트는 대화의 맥락을 잡아주고, 어시스턴트가 어떤 역할을 수행할지 명확히 지시하는 데 사용됩니다.

대화 흐름 관리: 응답 생성 함수 정의하기

다음으로 정의할 함수는 실행 흐름에서 가장 무거운 작업,
즉 모델이 사용자 입력을 받아 추론(inference)을 수행하고 응답을 생성하는 부분을 담당합니다.

특히 이 함수는 단발성 입력이 아닌, 연속적인 대화 세션(conversational session) 을 처리하기 위한 구조로 설계됩니다.
따라서 이전 대화 기록(chat history)을 함께 관리하고, 매 요청마다 이를 새 입력에 포함시키는 방식이 매우 중요합니다.

def generate_response(user_input, chat_history=None):
    if chat_history is None:
        chat_history = []

    # 1. 시스템 프롬프트 포함하여 초기 메시지 구성
    messages = [{"role": "system", "content": system_prompt}]

    # 2. 기존 대화 기록(chat history)을 메시지에 추가
    for message in chat_history:
        messages.append(message)

    # 3. 현재 사용자 입력 추가
    messages.append({"role": "user", "content": user_input})

    # 4. 메시지를 모델 입력 형식으로 변환 (토크나이징 전용 템플릿 사용)
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # 5. 모델 추론 수행 (처음은 다소 시간이 걸릴 수 있음)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,          # 생성할 최대 토큰 수
            do_sample=True,              # 샘플링 활성화 (더 자연스러운 응답 유도)
            temperature=0.7,             # 창의성 조절
            top_p=0.9,                   # 상위 확률 누적 컷오프
            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id
        )

    # 6. 토큰을 텍스트로 디코딩
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # 7. 어시스턴트의 응답만 추출 (사용자 입력 이후 텍스트 기준)
    assistant

인터랙티브 UI 구성하기: 사용자 입력과 응답을 실시간으로 주고받기

응답을 생성하는 핵심 함수 generate_response() 를 정의했다면,
이제는 이를 활용할 수 있는 간단한 사용자 인터페이스(UI) 를 구성해봅니다.

# 개인 어시스턴트용 간단한 UI 생성 함수
def create_assistant_ui():
    output = widgets.Output()  # 대화 내용 출력용 위젯
    input_box = widgets.Text(
        value='',
        placeholder='Ask me anything...',
        description='Question:',
        layout=widgets.Layout(width='80%')
    )
    send_button = widgets.Button(description="Send")
    clear_button = widgets.Button(description="Clear Chat")

    chat_history = []  # 대화 기록 저장

    # 전송 버튼 클릭 시 호출되는 함수
    def on_send_button_clicked(b):
        user_input = input_box.value
        if not user_input.strip():
            return

        with output:
            print(f"You: {user_input}")
            print("Assistant: Thinking...", end="\r")  # 응답 대기 표시

            start_time = time.time()
            try:
                response = generate_response(user_input, chat_history)
                end_time = time.time()

                clear_output(wait=True)  # 기존 출력 지우기
                print(f"You: {user_input}")
                print(f"Assistant: {response}")
                print(f"\n(Response generated in {end_time - start_time:.2f} seconds)")

                # 대화 기록 업데이트
                chat_history.append({"role": "user", "content": user_input})
                chat_history.append({"role": "assistant", "content": response})
            except Exception as e:
                clear_output(wait=True)
                print(f"You: {user_input}")
                print(f"Error generating response: {str(e)}")
                import traceback
                traceback.print_exc()

        input_box.value = ''  # 입력창 초기화

    # 대화 초기화 버튼 클릭 시
    def on_clear_button_clicked(b):
        with output:
            clear_output()
            print("Chat cleared!")
        chat_history.clear()

    # 버튼 클릭 이벤트 연결
    send_button.on_click(on_send_button_clicked)
    clear_button.on_click(on_clear_button_clicked)

    # Enter 키 입력 처리
    def on_enter(sender):
        on_send_button_clicked(None)
    input_box.on_submit(on_enter)

    # UI 구성 정렬
    input_row = widgets.HBox([input_box, send_button, clear_button])
    ui = widgets.VBox([output, input_row])

    return ui

아래는 주어진 문장을 한국어 기술 블로그 스타일로 자연스럽게 번역한 내용입니다. 앞에서 다룬 ipywidgets 기반 인터페이스와 대비하여, CLI 대화 흐름을 추가로 소개하는 부분입니다:

# CLI 환경에서 Qwen 챗봇과 대화하는 간단한 루프 함수
def cli_chat():
    print("\n=== Starting CLI Chat (type 'exit' to quit) ===")
    chat_history = []

    while True:
        user_input = input("\nYou: ")
        if user_input.lower() in ['exit', 'quit', 'q']:
            print("Goodbye!")
            break

        print("Assistant: ", end="")
        try:
            start_time = time.time()
            response = generate_response(user_input, chat_history)
            end_time = time.time()

            print(f"{response}")
            print(f"(Generated in {end_time - start_time:.2f} seconds)")

            # 대화 기록 업데이트
            chat_history.append({"role": "user", "content": user_input})
            chat_history.append({"role": "assistant", "content": response})
        except Exception as e:
            print(f"Error: {str(e)}")
            import traceback
            traceback.print_exc()

1. 모델 및 환경 점검용 빠른 테스트 함수

Qwen 모델이 제대로 로드되었는지, 필수 라이브러리들이 정상적으로 작동하는지를 확인하려면,
간단한 테스트용 함수를 작성해 초기 설정 상태를 빠르게 점검할 수 있습니다.

2. 전체 어시스턴트 실행 함수: UI 또는 CLI 선택 실행

어시스턴트 전체 애플리케이션을 실행할 수 있도록,
UI(위젯 기반) 또는 CLI(터미널 기반) 중 원하는 인터페이스를 선택적으로 실행할 수 있는 통합 함수도 정의해봅니다.

# 간단한 테스트 쿼리로 모델 상태 확인
def quick_test():
    test_question = "What can you help me with?"
    print(f"\nTest Question: {test_question}")

    start_time = time.time()
    try:
        response = generate_response(test_question)
        end_time = time.time()

        print(f"Response: {response}")
        print(f"Generation time: {end_time - start_time:.2f} seconds")
        return True
    except Exception as e:
        print(f"Test failed with error: {str(e)}")
        import traceback
        traceback.print_exc()
        return False

# 전체 어시스턴트 실행 함수: 인터페이스 선택 포함
def run_assistant():
    print("\nRunning quick test...")
    test_success = quick_test()

    if test_success:
        # 사용자에게 인터페이스 방식 선택 받기
        interface_choice = input("\nChoose interface (1 for UI, 2 for CLI): ")

        if interface_choice == "2":
            cli_chat()
        else:
            print("\nStarting the personal assistant UI...")
            assistant_ui = create_assistant_ui()
            display(assistant_ui)

            # UI 사용법 안내
            print("\n--- 사용법 ---")
            print("1. 텍스트 상자에 질문을 입력하세요")
            print("2. Enter 키

직접 사용해보기: Qwen 어시스턴트와의 대화 시작!

이제 모든 준비가 잘 완료되었다면,
드디어 우리가 만든 개인 AI 어시스턴트와 직접 대화해볼 차례입니다.
설치, 설정, 모델 로딩까지 모두 문제없이 진행되었다면 이제부터는 즐기기만 하면 됩니다!

현재글Qwen으로 시작하는 나만의 AI 비서 만들기

Lucid Lifestyle

안녕하세요. 제가 관심있는 분야에 대한 국내/해외의 소식을 공유하고자 합니다.

삼성, 메타, 아이폰, 코딩, 파이썬, 아이폰17, 아이폰16, Python, bts, 파이썬독학, 프로그래밍, 에어팟, 맥, 아이패드, 어플추천, AI, 인공지능, 파이썬기초, 애플, 맥북,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Lucid Lifestyle