AI, Blog, Python

How to use Neftune 

You can read the Article in detail here

QuickStart

First, you want to install the environment (assuming that you have conda installed)

conda create -n hug python=3.11
pip install -r requirements.txt

Please use the pytorch version that is specified in the requirements. Otherwise, this may cause some problems when loading in the model in train.py.

Converting checkpoints from huggingface to fsdp
The script convert_hf_to_fsdp.py is designed to transform a Hugging Face checkpoint into a format compatible with FSDP (Fully Sharded Data Parallel). This conversion enables the model to be loaded in a distributed fashion, significantly reducing memory usage. Typically, loading a Hugging Face model onto N GPUs requires instantiating N models in the CPU memory first, before transferring them to the GPUs. For large models, this process can exhaust CPU memory quickly. By running the command provided below, the model can be efficiently converted. This process was tested using four A5000 GPUs, each equipped with 24GB of GPU memory.

python convert_hf_to_fsdp.py --load_path $HF_CHECKPOINT_PATH --save_path $SAVE_PATH --add tokens $NUM_TOKENS
# `$NUM_TOKENS` is the number of new tokens that one wants to add to the pretrained model. In the case of llama, we add an additional padding token since it doesn't have one originally. For opt, we don't need to add new tokens, since it already contains all special tokens.

If you want to use a model that is on the huggingface hub, you can run the command below. We will use Llama-2–7b-hf as an example

SAVE_PATH_SHARDED=pretrained_models/Llama2_7b_sharded
SAVE_PATH_HF=pretrained_models/Llama2_7b_hf
python convert_hf_to_fsdp.py --load_path meta-llama/Llama-2-7b-hf \
--save_path $SAVE_PATH_SHARDED \
--save_path_hf $SAVE_PATH_HF

The script above will save a sharded version of the model in $SAVE_PATH_SHARDED and a huggingface checkpoint at $SAVE_PATH_HF. The sharded file only contains the weight, and the huggingface checkpoint is still needed to initialize the architecture and tokenizer.

Training
Vanilla Fine-tuning

python train.py --init_checkpoint_path <fsdp_model_path> \
--model_config_path <hf_model_path> --wrapped_class_name LlamaDecoderLayer \
--data_path datasets/alpaca-train.jsonl --added_tokens 1 \
--act_checkpointing --lr 5e-5 --accumulation_steps 8 --batch_size 4 \
--checkpoint_path ./checkpoints/naive --hack --wandb --wb_name naive

NEFTune with a noise magnitude of 5

python train.py --init_checkpoint_path <fsdp_model_path> \
--model_config_path <hf_model_path> --wrapped_class_name LlamaDecoderLayer \
--data_path datasets/alpaca-train.jsonl --added_tokens 1 \
--act_checkpointing --lr 5e-5 --accumulation_steps 8 --batch_size 4 \
--checkpoint_path ./checkpoints/neftune --hack --wandb --wb_name neftune \
--neftune_alpha 5

Evaluation
You may use the script here: scripts/alpaca_eval.sh

Leave a Reply