tag:blogger.com,1999:blog-21289662.post9108554039693898509..comments2023-09-26T01:39:21.395-07:00Comments on Bioinformatics: filtering paired end reads (high throughput sequencing)brentphttp://www.blogger.com/profile/12236821145627337774noreply@blogger.comBlogger31125tag:blogger.com,1999:blog-21289662.post-76918885247235369572011-11-14T14:27:21.161-08:002011-11-14T14:27:21.161-08:00Hi BrentP,
Thanks for this tool. I came here after...Hi BrentP,<br />Thanks for this tool. I came here after checking Fastx_clipper tool box manually. To my understanding it did NOT work as expected/or praised.<br /><br /> For sure, it pulled out all those reads with adapter sequences. But, in addition, it also pulled out reads that are originally came from my genome of intrest. It was very obvious, when i align those reads (containing adatper as told by fastx tool kit) aginst NCBI_NT db or my genome of intrest.....<br /><br />Following is the command i used to see what the fastx_clipper thinks as adapter containing reads....<br /><br /><i><br />fastx_clipper -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG -l 1 -d 60 -k -i ES003_carlos_1220_sequence_1.10K.fasta > mp1_index1containg_reads_onlyadapt<br />er.fa<br /></i><br />i use -d 60 so i can pull entire read again to check if it a good job of finding adapters<br /><br />Anything i am doing worng here? Or is it something been overlooked.....<br /><br />I will appreciate your help.Anonymoushttps://www.blogger.com/profile/17852194673795364275noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-3343487472506744192011-10-21T12:10:10.521-07:002011-10-21T12:10:10.521-07:00Hi, to any commenters having trouble, I have updat...Hi, to any commenters having trouble, I have updated the first sentence of the post to indicate that this should not be used.<br />I have not been maintaining this and don't intend to do so in the near future.<br />There are multiple versions about (my fault) with various different bugs.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-25984067613605717302011-10-20T07:27:28.113-07:002011-10-20T07:27:28.113-07:00Hi, Can you confirm if any of both scripts mention...Hi, Can you confirm if any of both scripts mentioned works properly? I tried pair_fastx_clip_trim.py and fastq_pair_filter.py. Both scripts crashes with the adaptors option. Without -a options, pair_fastx_clip_trim.py produces non sync files (one shorter than the other one and fastq_pair_filter.py produces files with a very few reads. With my 9GB start files the first script produces 7GB files and with the second one 150MB files. Thanks for your helpAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-64131387747911866982011-10-10T23:23:21.261-07:002011-10-10T23:23:21.261-07:00Hi, i just used your script on about 1 giga paired...Hi, i just used your script on about 1 giga paired reads. It has been a couple of hours that the scripts outputed:<br />writing read1.fastq.trim and read2.fastq.trim<br /><br />Is it unusual? What is the expected running time of this script on such data, and looking for 5 different adaptors?<br /><br />My machine has 24 cores.Habib Rhttps://www.blogger.com/profile/05472820973808725889noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-87138937546308260342011-08-03T16:37:53.483-07:002011-08-03T16:37:53.483-07:00I try the one downloaded from https://gist.github....I try the one downloaded from https://gist.github.com/588841. I found the output pairs DO NOT MATCH. Other version does not work: https://github.com/brentp/bio-playground/blob/master/reads-utils/fastq_pair_filter.pyYuehttps://www.blogger.com/profile/13626706375075786136noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-2223773456670671362011-07-18T10:11:10.624-07:002011-07-18T10:11:10.624-07:00@brentp
thanks a lot,
I don't want to use the...@brentp<br /><br />thanks a lot,<br />I don't want to use the adaptors part, I'm interested in the quality and pairing control. <br />I have just started to use it, <br />I will tell you if it works or not, if you like :)<br /><br />nike00nike00https://www.blogger.com/profile/00967537022125806088noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-25195281425655573042011-07-18T09:30:20.032-07:002011-07-18T09:30:20.032-07:00@nike00
the latest is here:
https://github.com/bre...@nike00<br />the latest is here:<br />https://github.com/brentp/bio-playground/blob/master/reads-utils/fastq_pair_filter.py<br /><br />but there are problems with the way fastx trims adaptors (often it removes the entire read) that I havent' dealt with. The quality trimming and pairing stuff should work find though.<br /><br />Also check out the scythe and sickle projects from the group and UC Davis.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-48828636820869592472011-07-18T09:00:20.761-07:002011-07-18T09:00:20.761-07:00Dear brentp, I'm interested in your script, bu...Dear brentp, I'm interested in your script, but now I really don't know which is the latest version, the best working one :)<br /><br />Can you put the link again?<br /><br />Thanks so much,<br />nikenike00https://www.blogger.com/profile/00967537022125806088noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-7811840861846468892011-06-08T09:46:54.648-07:002011-06-08T09:46:54.648-07:00It worked, thanks!It worked, thanks!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-39075167473593136132011-06-07T13:32:00.995-07:002011-06-07T13:32:00.995-07:00no. that's a different one.no. that's a different one.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-53620061755526867962011-06-07T13:21:40.389-07:002011-06-07T13:21:40.389-07:00I am pretty sre that was the one that I used. I a...I am pretty sre that was the one that I used. I am not sure if there is actually a problem. I used the output files to do an assembly and it worked fine. <br /><br />Also, the original untrimmed files were 7 g and 6 g and the files after using the script were 6.6 g and 5.8 g. That seems like what they should be but I am not sure.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-13281593159450436322011-06-07T13:16:12.482-07:002011-06-07T13:16:12.482-07:00@Anon,
use this script. I haven't updated the...@Anon, <br />use this script. I haven't updated the one in this post, will do so shortly.<br />https://github.com/brentp/methylcode/blob/master/bench/scripts/fastq_pair_filter.pybrentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-57635919181461146142011-06-07T13:09:49.444-07:002011-06-07T13:09:49.444-07:00Hi! Thanks so much for this script. I ran it and...Hi! Thanks so much for this script. I ran it and the files seemed to come out ok but I got the following message: <br /><br />"Traceback (most recent call last):<br /> File "./pairedtrim", line 81, in <br /> main(adaptors, opts.M, opts.t, opts.l, fastqs, opts.sanger)<br /> File "./pairedtrim", line 46, in main<br /> for ra, rb in gen_pairs(procs[0].stdout, procs[1].stdout):<br /> File "./pairedtrim", line 36, in gen_pairs<br /> assert not all(b), ("files not same length")<br />AssertionError: files not same length<br />fastq_quality_trimmer: writing nucleotides failed: Broken pipe<br />"<br /><br />the command line I ran was:<br /> ./pairedtrim -t 18 -l 10 ../s11utegl.txt ../s12utegl.txt &<br /><br />I did not use any of the adapter specifications because I am new to this and was a bit unclear if we had adaptor sequences. We did illumina paired end. Any help would be great!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-77749108584334732642011-05-26T00:46:26.531-07:002011-05-26T00:46:26.531-07:00If the length of read1 or read2 was trimmed to zer...If the length of read1 or read2 was trimmed to zero, could you add a function to report another read that was not trimmed to length of zero to the third output file?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-21238543118232077662011-02-18T06:55:59.176-08:002011-02-18T06:55:59.176-08:00So, you made 'fhr_headers' a generator whi...So, you made 'fhr_headers' a generator which I think has the advantage that the headers don't have to be loaded in memory all at once.<br /><br />However, in the end you still store them all in the 'seen' bitmap, right?<br /><br />I don't think you need to do this, something like <a href="http://pastebin.com/3TYa9rAM" rel="nofollow">this pseudo code</a> should do. Or am I missing something here?<br /><br />Furthermore, I don't have a Fastq file handy at the moment, but if the headers occur in the files according to some order, you don't need an original file at all. But probably this is not the case.Martijn Vermaathttps://www.blogger.com/profile/17824659562066753084noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-20746427103817799312011-02-11T13:49:30.718-08:002011-02-11T13:49:30.718-08:00i suspect you could run the barcode splitter on bo...i suspect you could run the barcode splitter on both ends of the reads independently, then run each pair of resulting files through this program. it should remove any reads that are unpaired -- either because of the barcode or because of quality filtering.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-39187671324192386722011-02-11T13:41:45.983-08:002011-02-11T13:41:45.983-08:00fastx Barcode splitter is useful for SE reads dire...fastx Barcode splitter is useful for SE reads directly. For Paired End reads without barcodes how I need to eliminate from from other mate?<br /><br />It will be great if you can implement it. It will be complete package for PE reads for filtering, trimming and BC split.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-52733460715492478442011-02-11T13:31:16.178-08:002011-02-11T13:31:16.178-08:00thanks for finding and reporting the errors. i hav...thanks for finding and reporting the errors. i have updated the gist.<br /><br />for barcode splitting (which i havent done), i think you can use fastx directly:<br /><br /><a href="http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_barcode_splitter_usage" rel="nofollow">http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_barcode_splitter_usage</a>brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-51683498485153590462011-02-11T13:26:16.335-08:002011-02-11T13:26:16.335-08:00@brentp
Thanks for your time. Its seems working......@brentp<br />Thanks for your time. Its seems working...<br /><br />BTW how to do de-multiplex (separate barcodes) in paired-samples and place reads in seperate bins?.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-18406760033647647532011-02-11T11:58:25.814-08:002011-02-11T11:58:25.814-08:00try specifying with -l 0 on the command-line, it s...try specifying with -l 0 on the command-line, it should be optional, but it must be defaulting to None, i will fix this shortly.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-44497368660746341112011-02-11T11:54:26.544-08:002011-02-11T11:54:26.544-08:00Thanks
No errors without inputs now
But with input...Thanks<br />No errors without inputs now<br />But with input has errors still:<br /><br />$ python2.7 fastq_pair_filter.py -a ACCC,CGTA,CAGT,TTAG -t 24 SRR018062_1.fastq RR018062_2.fastq <br /><br />Traceback (most recent call last):<br /> File "fastq_pair_filter.py", line 96, in <br /> main(adaptors, opts.M, opts.t, opts.l, fastqs, opts.sanger)<br /> File "fastq_pair_filter.py", line 22, in main<br /> trim_cmd = "%s -t %i -l %i" % (FASTQ_QUALITY_TRIMMER, t, l)<br />TypeError: %d format: a number is required, not NoneTypeAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-8899549950391513292011-02-11T11:31:28.484-08:002011-02-11T11:31:28.484-08:00please download from here:
http://gist.github.com/...please download from here:<br />http://gist.github.com/588841<br /><br />i edited the gist manually and put the new bracket in the wrong place in the gist and in my comment. the not should be out side the ()brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-77746860619601903422011-02-11T11:10:19.775-08:002011-02-11T11:10:19.775-08:00thanks, I followed your suggestion and edited the ...thanks, I followed your suggestion and edited the line 92<br /><br />"if (not fastqs and len(fastqs)) == 2:" <br /><br />still have same error<br /><br />$ python2.7 fastq_pair_filter.py <br />Traceback (most recent call last):<br /> File "fastq_pair_filter.py", line 96, in <br /> main(adaptors, opts.M, opts.t, opts.l, fastqs, opts.sanger)<br /> File "fastq_pair_filter.py", line 41, in main<br /> fha, fhb = procs[0].stdout, procs[1].stdout<br />IndexError: list index out of rangeAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-21289662.post-46512035870921620582011-02-11T10:53:48.023-08:002011-02-11T10:53:48.023-08:00@Anonymous . you are right! thanks for your patien...@Anonymous . you are right! thanks for your patience. I fixed the gist so it should work now. just added parenthesis at line 92 so it looks like this:<br /><br /> if (not fastqs and len(fastqs)) == 2:<br /><br />let me know if you have any more problems.brentphttps://www.blogger.com/profile/12236821145627337774noreply@blogger.comtag:blogger.com,1999:blog-21289662.post-77924408876056740772011-02-11T10:46:34.845-08:002011-02-11T10:46:34.845-08:00@brentp
I posted the command itself on the above ...@brentp<br /><br />I posted the command itself on the above posts.<br /><br /> "$ python2.7 fastq_pair_filter.py"<br /><br />Even with input files (paired end files) same error.<br /><br />If possible Can you also putup usage options of the programm<br /><br />ThanksAnonymousnoreply@blogger.com