Metadata
- Source
- DECA-287
- Type
- Bug
- Priority
- Major
- Status
- Open
- Resolution
- N/A
- Assignee
- N/A
- Reporter
- Jonathan Hung
- Created
- 2012-06-27T15:51:37.201-0400 
- Updated
- 2013-01-27T12:05:45.948-0500 
- Versions
- 
                        
                        - 0.5
- 0.6
- 0.7
 
- Fixed Versions
- 
                        
                        - Future
 
- Component
- 
                        
                        - genpdf
 
Description
For some images, large sections of text are omitted when generating Type 3 or Type 4. Typically the top few lines of text would be missing.
To reproduce, run the following on the relevant image:
./decapod-genpdf.py -d test-t4 -t 4 -p test-t4.PDF filename.png/jpeg 
./decapod-genpdf.py -d test-t3 -t 3 -p test-t3.PDF filename.png/jpeg
The following two images reproduce this error:
2-1-1.jpg
faithful-to-the-book-page-4-copy.jpeg (see attached PDF to see the results of a Type 3 export)
The following two images do not produce this error (despite being somewhat similar):
4-1-01-grey.jpg
Image_0016-grey.png
Format and colour do not appear to play a role as colour or TIFF versions of problematic images exhibit the same behaviour.
Attachments
Comments
- 
                        tamir@tamirhassan.com commented 2013-01-27T12:03:52.545-0500 The reason is because the line-finding stage of layout analysis has failed and the lines have not been found – and used for further processing. I've tried it out with the current version and get a much better result – only the page number at the top is missing. Ideally, all content not recognized as text lines would be included as part of a background image. 
- 
                        tamir@tamirhassan.com commented 2013-01-27T12:05:08.641-0500 (this comment relates to the file test-t3.pdf) This is the output that I got when running genpdf on the same pdf (t3). Only the page number at the top (not recognized as text?) is missing. Tamir 
 
                            
                         
                            
                         
                            
                        