Reading Out File Contents and File Properties

EvalKositValidatorReport()

This function evaluates a report of the KoSIT validator which was previously created with the file macro CreateKositValidatorReport(). The function returns the validation result (i.e., the recommendation to accept/reject the document from the "assessment" node).

Optionally, additional information about the recognized scenario and further details can be read out and assigned to specific target fields.

Return type: Boolean

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `*.report.xml`
2	Text	Name of a target field in the document to which the name of the recognized validation scenario is assigned (optional) Return type: text
3	Text	Name of a target field in the document to which detailed information about the validation result extracted from the report is assigned (optional) Return type: text Syntactically, this information is provided in the form of a JSON expression.

Parameter

Data Type

Description

Text

Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account

Default value: *.report.xml

Text

Name of a target field in the document to which the name of the recognized validation scenario is assigned (optional)

Return type: text

Text

Name of a target field in the document to which detailed information about the validation result extracted from the report is assigned (optional)

Return type: text

Syntactically, this information is provided in the form of a JSON expression.

Examples

EvalKositValidatorReport("*.report.xml", "Scenario", "ReportDetails") returns the validation result from the report file attachment found (e.g., TRUE). The name of the recognized scenario and checked details are written to the Scenario and ReportDetails fields.

ExtractFullTextOcr()

By performing OCR, this function detects the text content of a PDF file containing raster images or images.

Common image file formats are supported, including multi-page image file formats in the case of TIFF. For PDF files, both native text content is extracted and OCR performed across embedded images.

Return type: text

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `.tif\|.tiff\|.jpg\|.jpeg\|*.pdf`
2	Text	Pages to be included: `First`: first page only `Last`: last page only `All`: all pages (default value) Free specification of individual page numbers or page ranges (e.g., `1;2;3` or `1-3`)
3	Text	Language of the OCR dictionary to be used (e.g., `German` [default value] or `English`); if required, multiple other languages, comma-separated The appropriate dictionary file must be available in the program directory for the respective language (e.g., `deu.traineddata` or `eng.traineddata`). These two files are supplied with the program. Further dictionary files can be provided on request.
4	Bool	Boolean value determining whether, in the OCR, only the full-page images contained in the PDF will be included (e.g., scanned pages included in PDF) Default value: `TRUE` Otherwise, all images embedded in a PDF page are processed by OCR.
5	Number	Timeout value defining the number of seconds after which OCR processing of a single page will be aborted if no result has become available (optional) The text content of such a page will then not be adopted, and the program may continue with the next page.

Examples

ExtractFullTextOcr("*.tif", "1-3", "German") returns the text content of the first three pages of a TIFF file.

ExtractFullTextPdf()

This function reads the native text content of a PDF file attachment.

Return type: text

Parameter

Data Type

Description

Text

Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account

Default value: *.pdf

Text

Pages to be included:

First: first page only
Last: last page only
All: all pages (default value)
Free specification of page numbers or page ranges (e.g., 1;2;3 or 1-3)

Examples

ExtractFullTextPdf("*.pdf", "First") returns the text content of the first page of a PDF file.

FindEInvoiceFileByFormat(), FindEInvoiceFilesByFormat()

The FindEInvoiceFileByFormat() function reads the name of the first file attachment found that corresponds to a specific electronic invoice format (return type: text). If no matching file attachments are found, the return type is an empty string.

The FindEInvoiceFilesByFormat() function reads the names of all file attachments found that correspond to a specific electronic invoice format (return type: array of text values). If no matching file attachments are found, the return type is an empty array.

XML is supported as a file type and PDF is also supported for the ZUGFeRD format. However, ZUGFeRD files in the outdated 1.x format cannot be processed by the program, for which reason they will not appear in search results.

Parameter	Data Type	Description
1	Text	Name filter for the file attachments to be processed Default value: `.xml\|.pdf`
2*	Text	Identification of the e-invoice format you are looking for or comma-separated list of multiple formats: `XRechnung` `ZUGFeRD`

Parameter

Data Type

Description

Text

Name filter for the file attachments to be processed

Default value: *.xml|*.pdf

Text

Identification of the e-invoice format you are looking for or comma-separated list of multiple formats:

XRechnung
ZUGFeRD

Examples

FindEInvoiceFileByFormat("*.xml", "XRechnung") returns the name of the first XRechnung file attachment found (e.g., "invoice1.xml" or "" if there is no match).

FindEInvoiceFilesByFormat("*.xml", "XRechnung") returns the names of all XRechnung file attachments found (e.g., ["invoice1.xml", "invoice2.xml"] or [] if there are no hits).

GetEInvoiceFileFormat()

This function determines for an XML file attachment or a (ZUGFeRD) PDF file attachment with an embedded XML file which known e-invoice format this file corresponds to.

For this purpose, only the relevant XML node with the format identifier is evaluated. No more validation of the XML content is performed. Optionally, additional information about the recognized format and version can be extracted and assigned to specific target fields.

Return type: text (PeppolInvoice, PeppolPintInvoice, XRechnung, Zugferd, or Unknown)

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `.xml\|.pdf`
2	Text	Name of a target field in the document (optional) The recognized PEPPOL document type, XRechnung syntax or ZUGFeRD version is assigned to the target field. Return type: text
3	Text	Name of a target field in the document (optional) The recognized PEPPOL version, XRechnung version or the recognized ZUGFeRD profile is assigned to the target field. Return type: text
4	Text	Name of a target field in the document (optional) An error message text is assigned to the target field if no known e-invoice format is recognized. Return type: text

Examples

GetEInvoiceFileFormat("*.xml", "Syntax", "Version", "Error") returns the type of an XML file attachment (e.g., "XRechnung"). The syntax and version of the file attachment are also written in the relevant target fields (e.g., "UblInvoice" and "Version_3_0"). If the type is not recognized (type Unknown), the cause of the error can be written in the Error field.

GetExternalFileContent()

This function reads the content of an external file from the file system as a text value.

Return type: text

Parameter	Data Type	Description
1*	Text	Full path of the file in the file system

Examples

GetExternalFileContent("c:/test.txt") returns the text content of the specified file.

GetExternalFileProperty()

This function reads a property of an external file from the file system.

The return type will vary depending on the preferred property.

Parameter	Data Type	Description
1*	Text	Full path of the file in the file system
2*	Text	Name of the property to be read: `Created`: local creation date `Modified`: local modification date `ReadOnly`: is read-only? `Size`: size in bytes

Examples

GetFileContent("*.txt", "Original") returns the content of a text file attachment of the type "Original".

GetFileContent()

Reads the content of a file attachment as a text value.

Return type: text

Parameter	Data Type	Description
1*	Text	Name filter for the file attachment to be to be read in, whereby only the first attachment found is taken into account Default value: `*`
2*	Text	Restriction of the search to file attachments of a certain type (default value: all attachments) (optional): `Original`: original files from the input system `Processed`: files added by the program through extraction or conversion

Parameter

Data Type

Description

Text

Name filter for the file attachment to be to be read in, whereby only the first attachment found is taken into account

Default value: *

Text

Restriction of the search to file attachments of a certain type (default value: all attachments) (optional):

Original: original files from the input system
Processed: files added by the program through extraction or conversion

GetImageProperty()

This function reads a property of an image file attachment. The common raster image file formats are supported.

The return type will vary depending on the preferred property.

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `.jpg\|.jpeg\|.tif\|.tiff`
2*	Text	Name of the property to be read: `BitDepth`: color depth in bits (return type: number) `Format`: file format (e.g., "TIFF" or "JPEG") (return type: text) `HRes` and `VRes`: horizontal and vertical resolution in DPI, respectively (return type: number) `PageCount`: number of pages for multi-page TIFF data, otherwise always 1 (return type: number) `Width` and `Height`: page width and height in pixels (return type: number)

Parameter

Data Type

Description

Text

Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account

Default value: *.jpg|*.jpeg|*.tif|*.tiff

Text

Name of the property to be read:

BitDepth: color depth in bits (return type: number)
Format: file format (e.g., "TIFF" or "JPEG") (return type: text)
HRes and VRes: horizontal and vertical resolution in DPI, respectively (return type: number)
PageCount: number of pages for multi-page TIFF data, otherwise always 1 (return type: number)
Width and Height: page width and height in pixels (return type: number)

Examples

GetImageProperty("*.tif", "PageCount") returns the number of pages of a TIFF file attachment (e.g., 3).

GetJsonProperty()

This function reads the value of a JSON property from a field value or from a file that constitutes a JSON document in terms of content.

The return type varies depending on the data type of the value. For date values that are saved as a string in JSON format, the parser only performs an implicit conversion to a date value for a common syntax (e.g., ISO format).

If multiple values are read depending on the third call parameter, the values will be returned in an array, even if only a single value is found. If a property is not found at all or has the value NULL, the return value will be an empty string or an empty array.

Parameter	Data Type	Description
1	Text	Value or name of a field or name filter for the file attachment to be processed, whereby only the first attachment found is taken into account First, an attempt is made to interpret the value directly as a JSON document. Then, an attempt is made to find a field with the same name. If no field is found, a search is performed for a matching file. Default value: `. json`
2*	Text	JSONPath expression for addressing the JSON property to be read out Use the same syntax as with the index data reader "Json".
3	Bool	Boolean value determining whether all values will be read in for a property with potentially multiple values (i.e., an array) If not, only the first value will be adopted. Default value: `TRUE`

Parameter

Data Type

Description

Text

Value or name of a field or name filter for the file attachment to be processed, whereby only the first attachment found is taken into account

First, an attempt is made to interpret the value directly as a JSON document. Then, an attempt is made to find a field with the same name. If no field is found, a search is performed for a matching file.

Default value: . json

Text

JSONPath expression for addressing the JSON property to be read out

Use the same syntax as with the index data reader "Json".

Bool

Boolean value determining whether all values will be read in for a property with potentially multiple values (i.e., an array)

If not, only the first value will be adopted.

Default value: TRUE

Examples

GetJsonProperty("JsonData", "$.Name") returns the value of a Name property from the JSON data in a field called JsonData (e.g., "Value1").

GetJsonProperty("JsonData", "$.Name") is comparable to the previous example. Here, the source field is not addressed as a variable, but by its name.

GetJsonProperty("*.json", "$.Names[*]", TRUE) returns all values of an array Names from a JSON file attachment (e.g., ["Value1", "Value2"]).

GetPdfProperty()

This function reads a property from of a PDF file attachment.

The return type will vary depending on the preferred property.

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `*.pdf`
2*	Text	Name of the property to be read: `Author`, `CreationDate`, `Creator`, `Keywords`, `Metadata`, `ModificationDate`, `Producer`, `Subject`, `Title`, `Version`: Standard properties of a PDF file (return type: text) `EmbeddedCount`: Number of embedded file attachments (return type: number) As a first additional parameter, a name filter can be defined for the attachments to be included in the count. As a second additional parameter, a Boolean value can be defined as to whether attachments included as annotations at page level are also included (default value: `false`, meaning that only global attachments will be included). The second additional parameter only takes effect if a name filter is also set. `EmbeddedNames`: names of embedded attachments (return type: array of text values) The two additional parameters can be used in analogous manner to the use of the `EmbeddedCount` property. `HasText`: check whether native PDF text is included on any page (return type: Boolean) `IsEncrypted`: check whether the PDF file is encrypted (return type: Boolean) `IsImage`: check whether the PDF file consists exclusively of full-page images (return type: Boolean) `IsPdfA`: check whether the PDF file is in a valid PDF/A format (return type: Boolean) `IsEncrypted`: check whether the PDF file is encrypted (return type: Boolean) `PageCount`: number of pages in the PDF file (return type: number) `PdfLevel`: PDF type to which the file corresponds in the form `PDF#_#` or `PDF_A_#x`, e.g., `PDF1_7` or `PDF_A_2a` (return type: text) `Width` and `Height`: page width and height in millimeters (return type: number)
3	(variable)	First additional parameter that applies only to certain properties (optional)
4	(variable)	Second optional additional parameter that applies only to certain properties

Examples

GetPdfProperty("*.pdf", "EmbeddedNames", "*.xml") returns the names of embedded XML files in a PDF file attachment (e.g., ["factur-x.xml"]).

GetXmlNode()

This function reads the text content of a node from a field value or file attachment that represents an XML document in terms of content.

The return value is always of type "Text." If required, the return value must be converted to the desired target type. If multiple nodes are read in depending on the fourth call parameter, the values are returned in an array, even if only a single node is found. If a node is not found at all, the return value will be an empty string or an empty array.

Parameter	Data Type	Description
1	Text	Value or name of a field or name filter for the file attachment to be processed, whereby only the first attachment found is taken into account First, an attempt is made to interpret the value directly as an XML document. Then, an attempt is made to find a field with the same name. If no field is found, a search is performed for a matching file. Default value: `.xml`
2*	Text	XPath expression for addressing the XML node to be read out Use the same syntax as for the index data reader "Xml".
3	Bool	Boolean value determining whether namespace information contained in XML documents will be removed from them Problems with parsing can be avoided by removing the namespace information. An XPath expression for referencing nodes must then also be specified without a namespace prefix. Default value: `TRUE`
4	Bool	Boolean value determining whether all values of a potentially multiple node are to be read in (as an array) If not, only the first value of the node will be adopted. Default value: `TRUE`

Examples

GetXmlNode("XmlData", "/ubl:Invoice/cbc:ID") returns the value of the IDnode from the XML data in a field named XmlData, e.g. "00004711".

GetXmlNode("XmlData", "/ubl:Invoice/cbc:ID") is similar to the previous example. Here, the source field is not addressed as a variable, but by its name.

GetXmlNode("*.xml", "/Invoice/InvoiceLine/ID", TRUE, TRUE) returns the values of all ID nodes from an XML file attachment, e.g. ["1","2"].

IsPeppolXml(), IsPeppolPintXml(), IsUblInvoiceXml(), IsXRechnungXml(), IsZugferdXml(), IsZugferdPdf()

These functions determine whether an XML file attachment or a (ZUGFeRD) PDF file attachment with an embedded XML file corresponds to a known PEPPOL, UBL, XRechnung, or ZUGFeRD format.

For this purpose, only the relevant XML node with the format identifier is evaluated. No further validation of the XML content is performed. Optionally, additional information about the recognized format and version can be read out and assigned to specific target fields.

The syntax of the identifier for versions may vary. Only detection patterns for main versions are stored in the program. This way, the program does not have to be adapted for each new minor version. As an example, if the main version number 2.x is known to the program, then the returned identifier will be Version_2_x. In the case of unknown main version numbers, the entire version number will be read dynamically from the ID string of the e-invoice. In this case, the version number will contain the number of the sub-version instead of x.

Return type: Boolean

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value for `FindEInvoiceFileByFormat()` and `FindEInvoiceFilesByFormat()`: `.pdf\|.xml` Default value for `IsZugferdPdf`: `*.pdf` Notice If a name pattern that is also suitable for PDF files is transferred to the `IsZugferdXml()` function, the function can also process these PDF files in parallel to XML files. The specialized `IsZugferdPdf()` function therefore does not necessarily have to be called.
2	Text	Name of a target field in the document to which the recognized PEPPOL/UBL document type, XRechnung syntax, or ZUGFeRD version is assigned (optional) Return type: text
3	Text	Name of a target field in the document to which the recognized PEPPOL/UBL version, XRechnung version, or recognized ZUGFeRD profile is assigned (optional) Return type: text
4	Text	Name of a target field in the document to which an error message text is assigned if the file attachment was not recognized as the desired format (optional) Return type: text

Examples

IsXRechnungXml("*.xml", "Syntax", "Version", "Error") returns the check result for an XML file attachment as to whether the file attachment is an XRechnung (e.g., TRUE). The syntax and version are also written to the relevant target fields (e.g., "UblInvoice" and "Version_3_0"). In the event of a negative result, an error cause might be written in the Error field.

ReadBarcode()

This function reads barcode values from a (multi-page) TIFF or PDF file attachment. If only a single value is read, the return type will be text. If multiple values are read, the return type will be an array.

The search for a scalar value is defined by the inclusion of only one single page (First, Last, or the single page number) and only one value on this page (First or Last) according to the following parameters. If no value is found, an empty string or array will be returned.

Parameter	Data Type	Description
1	Text	Name filter for the file attachment to be processed, whereby only the first attachment found is taken into account Default value: `.tif\|.tiff\|*.pdf`
2	Text	Type of barcodes to search for: `Simple`: normal 1D barcode (default value) `DM`: data matrix code `QR`: QR code
3	Text	Pages to be included: `First`: first page only (default value) `Last`: last page only `All`: all pages Free specification of individual page numbers or page ranges (e.g., `1;2;3` or `1-3`)
4	Text	Within a page, found barcode locations to be used: `First`: first barcode only (default value) `Last`: last barcode only `All`: all barcodes
5	Text	Filter to limit the search to barcodes with specific content or structure (optional) Syntax: see Name Filter Syntax. Default value: `*`
6	Number	Resolution (dpi) for implicit conversion to raster images required for PDF pages before barcode recognition Default value: `300`

Examples

ReadBarcode("*.tif", "Simple", , , "A#######") returns the value of a barcode of a given pattern on the first page of a TIFF file attachment (e.g., "A0000001" or "" if there is no match).

ReadBarcode("*.tif", "Simple", "All", "All") returns the values of all barcodes on all pages of a TIFF file attachment (e.g., ["A0000001", "A0000002", "B0000001"] or [] if there is no match).

In this section:

xSuite Interface Windows Prism 5.x – Online Help