Trap-Style Rap Lyric Generator

6 min readApr 29, 2021

Yue Wang 2021.4.28

Introduction

Are you struggling with writing lyrics? Would you like to get a view of the new songs that may become popular in the future? Do you have a novel idea but do not know how to start writing a generator model? This article may help.

This blog aims to construct an automated rap lyric generator for people who are interested in rap or amateur rap lyricists who lack creativity and inspiration. Through an automated and systematic process, not only reasonable trap-style rap lyrics can be produced quickly, but rap songs of other styles, with appropriate training data set chosen, also can be invented under the same model. Therefore, this blog is assumed be the pillar of rap song lyrics production under the era of artificial intelligence.

The model concerned in this blog is mainly GPT-2, where a pretrained and fine-tuning method will be used. Moreover, additional attributes will be added into GPT-2 to improve the performance of the conditional text generation.

img is from https://images.app.goo.gl/w3RpRfnRhqLPhicX8

Data Crawling

First, a list of top artists in the fifield of trap rap songs from the website Last.fm is crawled. Then by leveraging the Genius API, at most 30 most popular rap songs of each trap artist are downloaded. Here, I use a library called lyricsgenius.

artist = genius.search_artist(a, max_songs=30, sort=’popularity’,include_features=True)

artist = genius.search_artist(a, max_songs=30, sort='popularity',include_features=True)
sum = 0
for i in artist.songs:
     lyrics = artist.song(i.to_dict()['title']).lyrics
     lyrics_df = pd.concat([lyrics_df,pd.DataFrame({'singer':[a],'title':[i.to_dict()['title']],'lyrics':[lyrics]})])
     sum +=1with open('data/singer.txt', 'a',encoding='utf8') as f:
     f.write(a)
     f.write(", ")
     f.write(str(sum)+'\n')
     f.close()

Note that adding a file to record the artists you’ve crawled is useful especially when the crawling takes a long time.

Data Cleaning

There are 418 different artists and 8548 instances of songs in total. The average of the songs of each artist is 20. And the standard deviation is 10.7 .

I found that previous work omit the structure property of a lyric. Thus I do the following data processing: I extract the lyrics by the part they belong to and label when with the title. The results can be shown in the following graph:

The distribution of words in different section

The data after cleaning is as follows:

Therefore, we have strong evidence that the section lyrics belong to matters in a song. In the following method, we will analyze this property.

Methods

The code of training is in model fold. I choose around 1785 songs and separate the input data into training, evaluating and testing by 0.6:0.2:0.2. The baseline I use the n-gram (n=8) on lyrics without structure. It generates the result with no influencing of the testing data. That is to day, except the necessary start tokens, it chooses the generative words randomly with inputting any words.

def generate_word(lm, history, order):
    history = tuple(history[-order:])
    dist = lm[history]
    x = random()
    for word, v in dist:
        x = x - v
        if x <= 0: return worddef generate_text(lm, order, max_words=500):
    '''Generates a sequence of up to max_words in length'''
    history = ["~"] * order
    out = [] 
    for i in range(max_words):
    word = generate_word(lm, history, order)
    if word == '[STOP]':  break
    if order == 0:  history = tuple()
    else:
        history = history[-order:]
        history.append(word)
    out.append(word)
return " ".join(out)

Then I use pretrained GPT-2 model and do fine-tuning. The library I use is https://huggingface.co/transformers/training.html In this process,I use the small version of GPT-2 with 12 layers of decoders.

from transformers import GPT2Tokenizer,GPT2LMHeadModel
from transformers import TextDataset,DataCollatorForLanguageModeling
tokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’)

We use the Trainer, TrainingArguments in transformers.

from transformers import Trainer, TrainingArguments
from transformers import GPT2LMHeadModel
training_args = TrainingArguments(
    output_dir="./output", #The output directory
    overwrite_output_dir=True, #overwrite the content of the output directory
    num_train_epochs=4, # number of training epochs
    per_device_train_batch_size=32, # batch size for training
    per_device_eval_batch_size=64,  # batch size for evaluation
    # per_device_train_batch_size=16, # batch size for training
    # per_device_eval_batch_size=32,  # batch size for evaluation
    eval_steps = 400, # Number of update steps between two evaluations.
    save_steps=800, # after # steps model is saved 
    warmup_steps=500,# number of warmup steps for learning rate scheduler
    prediction_loss_only=True,
    )


trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=dev_dataset,
)trainer.train()
trainer.save_model()

After training, the output is generated by taking some intro words from the testing dataset.

import json
new_songs = []
num = 0
for i in d[int(length*0.8):]:
    try:
        m = re.match("\[intro.*\] <Newline>.{20}?",i)
        prompt = re.sub('\[.*\] <Newline>','',m.group())
        if m == '': prompt = 'I only love'
        new = generation(prompt,num)
        new_songs.append(list(new.values()))
    except:
        prompt = 'be a savage'
        new = generation(prompt,num)
        new_songs.append(list(new.values()))
    num +=1
    print(num)
new_dict = {'GPT2-no-structure':new_songs}

Finally, I add title and section attributes into consideration. The idea of the process is from:

https://www.ivanlai.project-ds.net/post/conditional-text-generation-by-fine-tuning-gpt-2

First, special token should be added and we need to combine the attributes and lyrics and input them as a whole:

Fist, special token should be added and we need to combine the attributes and lyrics and input them as a whole:

SPECIAL_TOKENS  = { "bos_token": "<|BOS|>",
                    "eos_token": "<|EOS|>",
                    "unk_token": "<|UNK|>",                    
                    "pad_token": "<|PAD|>",
                    "sep_token": "<|SEP|>"}class myDataset(Dataset):

    def __init__(self, data, tokenizer):
        keywords = ['intro', 'verse', 'chorus', 'bridge', 'hook']
        input = []
        for index,row in data.iterrows():
            key_i, text_i = [],[]
            for i in keywords:
                if row[i] !=' ':
                    key_i.append(i)
                    text_i.append(re.sub('(\n)+',' <Newline> ',row[i]))
            input_i = SPECIAL_TOKENS['bos_token'] + row.title + \
                SPECIAL_TOKENS['sep_token'] + ', '.join(key_i) + SPECIAL_TOKENS['sep_token'] + \
                ' <Newsong> '.join(text_i) + SPECIAL_TOKENS['eos_token']
            input.append(input_i)
 
        
        self.tokenizer = tokenizer 
        self.text      = input

    #---------------------------------------------#

    def __len__(self):
        return len(self.text)

    #---------------------------------------------#
    
    def __getitem__(self,i):
        encodings_dict = tokenizer(self.text[i],                                   
                                   truncation=True, 
                                   max_length=MAXLEN, 
                                   padding="max_length")   
        
        input_ids = encodings_dict['input_ids']
        attention_mask = encodings_dict['attention_mask']
        
        return {'label': torch.tensor(input_ids),
                'input_ids': torch.tensor(input_ids), 
                'attention_mask': torch.tensor(attention_mask)}

The tokenizer and model’s config should add special token. And we can define which layer of GPT-2 model we want to use.

ef get_tokenier(special_tokens=None):
    tokenizer = AutoTokenizer.from_pretrained(MODEL) #GPT2Tokenizer

    if special_tokens:
        tokenizer.add_special_tokens(special_tokens)
        print("Special tokens added")
    return tokenizer

def get_model(tokenizer, special_tokens=None, load_model_path=None):

    #GPT2LMHeadModel
    if special_tokens:
        config = AutoConfig.from_pretrained(MODEL, 
                                            bos_token_id=tokenizer.bos_token_id,
                                            eos_token_id=tokenizer.eos_token_id,
                                            sep_token_id=tokenizer.sep_token_id,
                                            pad_token_id=tokenizer.pad_token_id,
                                            output_hidden_states=False)
    else: 
        config = AutoConfig.from_pretrained(MODEL,                                     
                                            pad_token_id=tokenizer.eos_token_id,
                                            output_hidden_states=False)    

    #----------------------------------------------------------------#
    model = AutoModelForPreTraining.from_pretrained(MODEL, config=config)

    if special_tokens:
        #Special tokens added, model needs to be resized accordingly
        model.resize_token_embeddings(len(tokenizer))

    if load_model_path:
        model.load_state_dict(torch.load(load_model_path))

    model.cuda()
    return model

Results and Discussion:

I use MS-Jaccard as our main evaluation metric while BLEU as secondary, both comparing with the generated one to the original one(test data) The idea of calculating MS-Jaccard is get from:

https://github.com/IAmS4n/TextGenerationEvaluationMetrics/blob/789aa6141784293984ead53aea3eb579952d3f46/README.md

The table shows that the GPT-2 with structure attributes has a higher MS-Jaccard then GPT-2. The higher MS-Jaccard-3 shows that compared to traditional GPT-2 model generated text, it has a better overall performance in diversity and quality.

But there’s still lots of space for improvement and our lyrics are not good enough to be presented. Also, we only did one generation and performed one evaluation. Maybe multiple times comparison gives a more fair result with less deviation.

Since the current trend for neural language generation will be generating words based on a lot of context provided. For example, machine translation, paragraph summary, question answering all have information that can be extracted from the context. If there’s no context like the case of our project,Ineed to have some auxiliary attributes. But how to combine those attributes need to be explored.

The reason why the improvement is very little may be because of the separation of lyrics. The separation of lyrics is hard to be improved. Different from paragraphs consisting of sentences and poetry having fixed length sentences, lyrics can stop at any words and add any single words like “yeah”, “oh my”. The method this paper chose is adding a new token, which turns out adding huge computation but having little effect. I’ve tried a lot of times. I think GPT-2 is good at generating long paragraphs. In this aspect, maybe RNN is worth a try.

What’s Next

Combining the BLEU-4 score I can get the conclusion that our improved model still needs more exploration to improve its quality. We may need to find more attributes and figure out a better way to combine them with the lyrics. Also, RNN can be explored.

Note that all of my code can be obtained from https://github.com/wyjessica860/sechs-hundert-drei-ig_project/blob/main/