PDF to PDF/A Conversion, Enhanced Text Search & Text Extraction in PDF Files inside .NET Apps

Aspose team is pleased to announce the release of Aspose.Pdf for .NET 17.4.0. It has specifically focused towards stability of existing features for PDF files creation as well as manipulation. Aspose.Pdf for .NET is widely being used by many customers and trusted name due to its robustness and reliability on various platforms. Therefore apart from standard desktop versions of Windows, it is equally supported on Server editions of Windows and users can use it in Desktop and Web Enterprise Applications. Keeping same aspects of stability and reliability in mind, with every new release, Aspose team are introducing some exciting new features as well as strive to bring more stability to existing features of API. Therefore in this new release, Aspose team has made improvements in PDF to PDF/A conversion feature to cater more customers scenarios and to handle documents with different structure and complexity. This API is also amazing in Text manipulation, whether its Text Addition, Text extraction, Text Search or Text replacement. However in this release, Aspose team has specifically improved the Text extraction feature and many of the customers reported issues related to this feature have been fixed in this release version. Furthermore, the PDF file conversion to other supported file formats is also improved. Ever since the release of the first MergedAPI version of Aspose.Pdf for .NET 6.0 in July-2011, it has included the Classes and Enumerations from legacy Aspose.Pdf for .NET under Aspose.Pdf.Generator namespace, the Classes and Enumerations of legacy Aspose.Pdf.Kit for .NET under Aspose.Pdf.Facades namespace and a new Document Object Model for PDF files creation as well as manipulation under Aspose.Pdf namespace. The legacy Aspose.Pdf.Generator only provided the capabilities to create PDF documents from scratch and legacy Aspose.Pdf.Kit for .NET provided the functionality to manipulate existing PDF files. But the new DOM approach of Aspose.Pdf namespace facilitates the creation of PDF files from scratch; as well as manipulation of existing PDF files. So in order to minimize / eliminate the confusion of customers to stumbled upon between legacy Aspose.Pdf.Generator and new Aspose.Pdf namespace, starting next release of Aspose.Pdf for .NET 17.5, Aspose team is going to discontinue Aspose.Pdf.Generator namespace from API. Nevertheless, the new DOM approach contains all the features offered by legacy Aspose.Pdf.Generator. This change may impact some customers who haven’t yet migrated their code from legacy Aspose.Pdf.Generator to new DOM approach, but Aspose team is always delighted to help Aspose customers and would love to facilitate them regarding their migration towards new DOM. Also please note that with this change, the size of product binaries will also be reduced. The list of important new and improved features are given below

• The flattened file from XFA form cannot be opened in Chrome or Firefox.
• Create separate local links for duplicate text
• NullReferenceException is thrown when trying to get artifact text
• ArgumentException is thrown when trying to get artifact text
• Exception when trying to load PDF document
• Exception when trying to get signature names
• Exception thrown when trying to get the names of the fields in a form
• PDF to PDFA conversion performance issue
• Failed to validate PDF_X_3 and PDF_X_1A
• Stamp looks incorrect when size and rotate angle are set
• Convert Web Page to PDF - bad layout
• When PDF is converted to PDF_A_1B, the text looks different
• For Helvetica and Courier fonts, page number in the TOC entry is missing
• When PDF is converted to DOCX, the text is missing.(converted to image)
• Images not rendered to next page
• PDF to PDFA1b conversion results in-compliant PDFA document

Other most recent bug fixes are also included in this release.


  • Aspose team is very excited to announce the new version of Aspose.PDF for .NET 18.6. This new release has introduced new features related to text manipulations and PDF/UA validation. Along with that, it has also made some fixes to the bugs, reported in earlier versions of the API. It has been an essential requirement to extract highlighted text from PDF documents. Earlier it was possible to extract text from PDF documents on the basis of some specific regular expressions or by specifying a string to be searched. TextFragmentAbsorber and TextAbsorber classes of the API, have been being used quite often and efficiently to serve the purpose. However, regarding the requirement of extracting highlighted text from PDF document, it has investigated the feature and introduced TextMarkupAnnotation.GetMarkedText() and TextMarkupAnnotation.GetMarkedTextFragments() methods in API. Users can extract highlighted text from PDF document by filtering TextMarkupAnnotation and using mentioned methods. An example, demonstrating the feature usage has also been showcased in the API documentation. While removing text from PDF documents using earlier versions of the API, users needed to set found text as empty string. The performance overhead in this case was, to invoke a number of checks and adjustment operations of text position. Which was why, several performance issues were observed while performing such operations. It could not minimize the number of checks and adjustment operations, as they are essential in text editing scenarios. Moreover, users cannot determine, how many of text fragments will removed and adjusted when they are processed in loop. In Aspose.PDF for .NET 18.6, new Aspose.Pdf.Operators.TextShowOperator() method has been introduced, in order to remove all text from PDF pages. Therefore, we recommend using this method to remove all text from PDF document, as it surely minimizes the time and works very fast. In latest release of Aspose.PDF for .NET, all descendants of Aspose.Pdf.Operator were moved into namespace Aspose.Pdf.Operators. Thus ‘new Aspose.Pdf.Operators.GSave()’ should be used, instead of ‘new Aspose.Pdf.Operator.GSave()’. While upgrading to latest version of the API, users will need to upgrade your existing code where users has used previous Aspose.Pdf.Operator namespace. It has have also worked for introducing Accessibility Features, thus introduced new features as part of work on 508 compliance (WCAG) such as PDF/UA validation feature was added and Tagged PDF support was added. The list of important new and improved features are given below

    • Add feature "Extract Highlighted Text from HighlightTextMarkUpAnnotations" to the TextFragmentAbsorber class
    • Add support of OTF font when embedding in PDF
    • Text Extraction - Spaces are improperly embedded inside words
    • TableAbsorber throws exception while trying to access any row other than first row of first table or any other table than first
    • PDF to Image - Some contents are overlapping
    • PDF to JPEG - Incorrect output
    • TableAbsorber: incorrect table count in PDF
    • Text is overlapped when saving particular document as image or HTML
    • PDF to HTML - Object reference not set to an instance of an object
    • Conversion HTML to PDF produces incorrect output
    • PDF to PDFA - Comments are broken in resultant document
    • Flattening Fields is not flattening the Print button inside PDF
    • The output is too big after conversion to PDFA_1B format
    • After conversion PDF-to-PDFA the output contains corrupted diagram
    • The document loaded from HMTL file looks different then original
    • PDF to PDF/A-1b - the output PDF does not pass compliance test
    • PDF to PDF/A-1b - the output PDF does not pass compliance test
    • PDF to JPG - Blue gradient is darker in the JPG compared to the PPT slide PDF
    • PDF to JPG - Objects fading to transparent
    • PDF to JPG - transparent turns to white
    • DF to JPG - Objects fading to transparent causes image differences
    • PDF to JPG - Objects fading to transparent causes image differences
    • Yellow background not same after converting PDF to PDF/A
    • JPEG output loses the fade effect on the source document
    • The document image loses fading to transparent in PDF output
    • Blank pages added after HTML to PDF rendition
    • PDF to PDF/A-2b - the chart labels are rotated
    • PDF to PDF/A-2b - some labels get blurred
    • Duplicated evaluation watermarks when saving EPUB document
    • Output image or html is filled with black color
    • HTML to PDF - exception thrown
    • Flattening Fields is not flattening the buttons inside PDF
    • Multi byte characters not displayed in PDF
    • Header added but footer is missing (HTML->PDF)
    • The header and the footer exist only on the first page.
    • Missing table after adding to Footer
    • PDF to PDF/A-2b
    • Unable to load OTF Font from a resource stream

    Other most recent bug fixes are also included in this release.

    Read complete release notes here: https://goo.gl/YbFzWn

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


In this Discussion