How necessary is feature-classifier extract-reads?

Well this may not be an issue for many, I though it worthwhile to mention here. I had been struggling with refining my database I was using for my COI gene, as in I let the pipeline run for a week and it was still not successful!

The issue was identified to be at the feature-classifier extract-reads step, my primers are probably too degenerative and just causing mayhem.

As per Qiime 2 forum:

Hi there

Previously I brought up this topic, but it could not be resolved:

I am quite a newbie to Qiime2 and I seem to have run into a potential problem regarding the qiime feature-classifier extract-reads.

I am experiencing a very lengthy extract-reads, I have not experienced this problem using my other primers (18S) with other databases (PR2 and SILVA). I am running my pipeline through a high performance computer (so computing power is not a problem – currently using mem-per-cpu=10GB and cpus-per-task=16) and generally takes ~2 hours to run. The database I am using now (Midori) is double the size of the others, but I do not understand how 48 hours is not sufficient for it to run.

Primers:
f-primer GGWACWGGWTGAACWGTWTAYCCYCC
p-r-primer TANACYTCNGGRTGNCCRAARAAYCA

The memory is fine, slurm output says job just ran out of time. I have been running other jobs after this and there is no problems regarding memory.

It is just very frustrating when I am tweaking the pipeline and have to wait for the end result to see how it influences the results.

Update: it did not complete running in a week either

I have uncovered if I ignore the step feature-classifier extract-reads I can get the results within 24 hours for COI. As 18S worked with the previous method, I tested and compared results and they do differ (generally when down to genus/species, so for 90% of these cases I can see how they could potentially be compared) . So I would just like advice on if it is advisable to do this, as I see no other option?

Appreciate the help :upside_down_face:

Hi @Aimee,
extract-reads is definitely not necessary, and the advantages that we see for 16S may not generalize to other marker genes (as we note here). It gives a small boost in accuracy, but that is not worth the wait time you are experiencing for your COI database.

I would definitely recommend just proceeding without trimming — at worst, there will be a slight accuracy decrease at species level.

Good luck!

As always the guys at Qiime respond without delay and are super helpful!