Qichang Zheng - Profile

About me

I'm an experienced AI Scientist at Devz AI, US, working to develop AI applications. I enjoy researching cutting-edge LLMs & techniques and exploring their practical applications.

My job is to research AI models and techniques, then build them into applications. I generated datasets to finetune models on our business tasks, created knowledge base for Retrival-Augmented Generation (RAG), developed tools for model to use, applied Chain-of-Thought (COT) and few-shots to improve model performance, integrated applications with our front-end & back-end pipelines, and finally empower our products with advanced AI capabilities.

What i'm doing

AI Applications Research & Development

The most modern and high-quality AI applications design (RAG, Agent, Tool-calling, etc.).
LLMs Deployment & Finetuning

Improve model's performance on specific tasks (LoRA, rsLora, Ollama, etc.).
Cloud Computing

Deploy services on elastic cloud server and publish APIs for cross-functional teams.
Software Engineering

Develop advanced services and integrate them into corporate product.

Resume

Education

University of Chicago
2022 — 2024

Program: Master in Computer Science

GPA: 3.9

Core courses: Machine Learning, Natrual Language Processing, Software Engineering, Data Science, Databases, etc.

Honor: Phoenix Scholarship ($64,340)
Xi'an Jiaotong-Liverpool University
2018 — 2022

Program: BSc in Economics and Finance

GPA: 3.93 (Top 2%)

Core courses: Quantitative Finance, Econometrics, Calculus, Microeconomics, Macroeconomics, Financial Management, Corporate Finance, etc.

Honor: Excellence Academy Award (2021, ¥10,000), Excellent Student Scholarship (2020, ¥10,000)

Experience

Devz AI, AI Scientist
08/2024 — Present

•    Developed and deployed 20+ LLM applications on AWS with RAG to automate IT operations (identify incident, import, generate, recommend, apply, and execute solutions). Built webagent to perform multiple actions on webpages and customized chatbot as user assistant. Successfully automated 90% of engineer workload and contributed to $10M in new orders.

•    Fine-tuned LLaMA Models (70B, 11B) with optimization techniques (flash-attention, unsloth, deepspeed, quantization) and regularization methods (dropout, rsLoRA, DoRA) on specific business tasks; Deployed with VLLM and Ollama to achieve 20% improvement in knowledge tests, 50% higher successful rate, and 80% lower cost compared to o1, Claude, and Gemini.

•    Built web agent system with RAG and COT to realize end-to-end automatic incident resolution by recursively interacting with browser (e.g. add user to a group, create a computing notebook, configure/restart a computing cluster), as well as validating the task result. Developed cache system to cache steps with editable parameters for faster execution and customization.

•    Designed recommendation system with multiple vector bases that boosted accuracy by 45%. Improved recommended results and final selection by extending data scope, rerank models, pre-defined rules, weighted scores, AI summarizer and filter.

•    Created chatbot with tool-calling and MCP that helps user to look up information and perform actions to on company platform.
Prudential Financial, Machine Learning Engineer
09/2023 — 03/2024

• Boosted 50% speed and saved 75% cost compared to ChatGPT4 in stock topics by deploying an advanced RAG LLM structure in Docker and achieved 62% accuracy in stock prediction. Automated the process of keyword extraction, knowledge index (MongoDB), and request for stock data and news with customized ETL Pipeline and RESTful API (Flask).

• Deployed an OneAPI transit service platform with account and cost management to integrate multiple LLMs’ APIs (Demos: OpenAI, Claude, Gemini, Llama3, etc.) that allows developers to call different LLMs with universal base url and secret key.
Shannon Investment, NLP Engineer
07/2023 — 02/2024

•    Designed a public opinion analysis system by introducing 20+ LLMs with an internal knowledge base mounted for stakeholders; Enabled fine-tuning, multi-task evaluation and deployment of new LLMs; Encapsulated all services into API for front end team.

•    Developed APIs and ETL pipelines (Kafka) to link LLMs with internal knowledge base (MongoDB, Elasticsearch) to provide complementary data for RAG in LLM conversation; Built LangChain pipelines for LLMs multi-turn conversation.

•    Improved LLMs performance by fine-tuning with P-TuningV2 and (Q)LoRA and creating a prompt management system to match LLMs and tasks with customized prompt templates. Successfully boosted 20% F1 score in sentiment analysis that achieved a Sharpe Ratio of 3 with better sentiment factors and speeded up 10X data processing time on trillion-level data.