RDP Database

RDP database conversion for Qiime 2 feature-classifier

Data source – http://rdp.cme.msu.edu/misc/resources.jsp – 16S Bacteria unaligned fasta format

Other databases – https://eukref.org/databases/

Taxonomy file:

scp current_Bacteria_unaligned.txt RDP_tax.txt #copy database to become the taxonomy file only
grep -e “>” current_Bacteria_unaligned.txt>RDP_tax.txt #call out all the lines starting with “>”, this way you get the reference ID and the taxonomic assignment
sed -i ‘s/Lineage=/\n/’ RDP_tax.txt #place taxonomic assignment on a new line
sed -i ‘s/\(>\w*\d*\).*/\1/’ RDP_tax.txt #keep only  the ID – there is some extra info that is not needed
tr -d ‘\n’ < RDP_tax.txt > RDP_tax_new.txt #all ‘newlines’ are deleted, so one long line is made
sed -i ‘s/Root;rootrank;/\t/g’ RDP_tax_new.txt #create a tab between the ID and the assignment
sed -i ‘s/>/\n/g’ RDP_tax_new.txt #every ID and assignment is placed onto a new line
sed -i ‘1{/^$/d}’ RDP_tax_new.txt #first line is deleted as it is blank

Fasta file:

scp current_Bacteria_unaligned.txt RDP.fa #copy database to become the fasta file only
sed -i ‘s/\(>\w*\d*\).*/\1/’ RDP.fa #remove taxonomy assignment and any extra info
sed -i ‘s/\(.*\)/\U\1/’ RDP.fa #converting all sequences to uppercase