AI, Blog, Python

How to use Neftune 

You can read the Article in detail here


First, you want to install the environment (assuming that you have conda installed)

conda create -n hug python=3.11
pip install -r requirements.txt

Please use the pytorch version that is specified in the requirements. Otherwise, this may cause some problems when loading in the model in

Converting checkpoints from huggingface to fsdp
The script is designed to transform a Hugging Face checkpoint into a format compatible with FSDP (Fully Sharded Data Parallel). This conversion enables the model to be loaded in a distributed fashion, significantly reducing memory usage. Typically, loading a Hugging Face model onto N GPUs requires instantiating N models in the CPU memory first, before transferring them to the GPUs. For large models, this process can exhaust CPU memory quickly. By running the command provided below, the model can be efficiently converted. This process was tested using four A5000 GPUs, each equipped with 24GB of GPU memory.

python --load_path $HF_CHECKPOINT_PATH --save_path $SAVE_PATH --add tokens $NUM_TOKENS
# `$NUM_TOKENS` is the number of new tokens that one wants to add to the pretrained model. In the case of llama, we add an additional padding token since it doesn't have one originally. For opt, we don't need to add new tokens, since it already contains all special tokens.

If you want to use a model that is on the huggingface hub, you can run the command below. We will use Llama-2–7b-hf as an example

python --load_path meta-llama/Llama-2-7b-hf \
--save_path $SAVE_PATH_SHARDED \
--save_path_hf $SAVE_PATH_HF

The script above will save a sharded version of the model in $SAVE_PATH_SHARDED and a huggingface checkpoint at $SAVE_PATH_HF. The sharded file only contains the weight, and the huggingface checkpoint is still needed to initialize the architecture and tokenizer.

Vanilla Fine-tuning

python --init_checkpoint_path <fsdp_model_path> \
--model_config_path <hf_model_path> --wrapped_class_name LlamaDecoderLayer \
--data_path datasets/alpaca-train.jsonl --added_tokens 1 \
--act_checkpointing --lr 5e-5 --accumulation_steps 8 --batch_size 4 \
--checkpoint_path ./checkpoints/naive --hack --wandb --wb_name naive

NEFTune with a noise magnitude of 5

python --init_checkpoint_path <fsdp_model_path> \
--model_config_path <hf_model_path> --wrapped_class_name LlamaDecoderLayer \
--data_path datasets/alpaca-train.jsonl --added_tokens 1 \
--act_checkpointing --lr 5e-5 --accumulation_steps 8 --batch_size 4 \
--checkpoint_path ./checkpoints/neftune --hack --wandb --wb_name neftune \
--neftune_alpha 5

You may use the script here: scripts/

Leave a Reply