How to Install and Uninstall perl-HTML-TableExtract Package on openSUSE Leap
Last updated: November 07,2024
1. Install "perl-HTML-TableExtract" package
This is a short guide on how to install perl-HTML-TableExtract on openSUSE Leap
$
sudo zypper refresh
Copied
$
sudo zypper install
perl-HTML-TableExtract
Copied
2. Uninstall "perl-HTML-TableExtract" package
Here is a brief guide to show you how to uninstall perl-HTML-TableExtract on openSUSE Leap:
$
sudo zypper remove
perl-HTML-TableExtract
Copied
3. Information about the perl-HTML-TableExtract package on openSUSE Leap
Information for package perl-HTML-TableExtract:
-----------------------------------------------
Repository : Main Repository
Name : perl-HTML-TableExtract
Version : 2.15-bp155.2.8
Arch : noarch
Vendor : openSUSE
Installed Size : 103.9 KiB
Installed : No
Status : not installed
Source package : perl-HTML-TableExtract-2.15-bp155.2.8.src
Upstream URL : https://metacpan.org/release/HTML-TableExtract
Summary : Perl module for extracting the content contained in tables within an HTM[cut]
Description :
HTML::TableExtract is a subclass of HTML::Parser that serves to extract the
information from tables of interest contained within an HTML document. The
information from each extracted table is stored in table objects. Tables
can be extracted as text, HTML, or HTML::ElementTable structures (for
in-place editing or manipulation).
There are currently four constraints available to specify which tables you
would like to extract from a document: _Headers_, _Depth_, _Count_, and
_Attributes_.
_Headers_, the most flexible and adaptive of the techniques, involves
specifying text in an array that you expect to appear above the data in the
tables of interest. Once all headers have been located in a row of that
table, all further cells beneath the columns that matched your headers are
extracted. All other columns are ignored: think of it as vertical slices
through a table. In addition, TableExtract automatically rearranges each
row in the same order as the headers you provided. If you would like to
disable this, set _automap_ to 0 during object creation, and instead rely
on the column_map() method to find out the order in which the headers were
found. Furthermore, TableExtract will automatically compensate for cell
span issues so that columns are really the same columns as you would
visually see in a browser. This behavior can be disabled by setting the
_gridmap_ parameter to 0. HTML is stripped from the entire textual content
of a cell before header matches are attempted -- unless the _keep_html_
parameter was enabled.
_Depth_ and _Count_ are more specific ways to specify tables in relation to
one another. _Depth_ represents how deeply a table resides in other tables.
The depth of a top-level table in the document is 0. A table within a
top-level table has a depth of 1, and so on. Each depth can be thought of
as a layer; tables sharing the same depth are on the same layer. Within
each of these layers, _Count_ represents the order in which a table was
seen at that depth, starting with 0. Providing both a _depth_ and a _count_
will uniquely specify a table within a document.
_Attributes_ match based on the attributes of the html tag, for
-----------------------------------------------
Repository : Main Repository
Name : perl-HTML-TableExtract
Version : 2.15-bp155.2.8
Arch : noarch
Vendor : openSUSE
Installed Size : 103.9 KiB
Installed : No
Status : not installed
Source package : perl-HTML-TableExtract-2.15-bp155.2.8.src
Upstream URL : https://metacpan.org/release/HTML-TableExtract
Summary : Perl module for extracting the content contained in tables within an HTM[cut]
Description :
HTML::TableExtract is a subclass of HTML::Parser that serves to extract the
information from tables of interest contained within an HTML document. The
information from each extracted table is stored in table objects. Tables
can be extracted as text, HTML, or HTML::ElementTable structures (for
in-place editing or manipulation).
There are currently four constraints available to specify which tables you
would like to extract from a document: _Headers_, _Depth_, _Count_, and
_Attributes_.
_Headers_, the most flexible and adaptive of the techniques, involves
specifying text in an array that you expect to appear above the data in the
tables of interest. Once all headers have been located in a row of that
table, all further cells beneath the columns that matched your headers are
extracted. All other columns are ignored: think of it as vertical slices
through a table. In addition, TableExtract automatically rearranges each
row in the same order as the headers you provided. If you would like to
disable this, set _automap_ to 0 during object creation, and instead rely
on the column_map() method to find out the order in which the headers were
found. Furthermore, TableExtract will automatically compensate for cell
span issues so that columns are really the same columns as you would
visually see in a browser. This behavior can be disabled by setting the
_gridmap_ parameter to 0. HTML is stripped from the entire textual content
of a cell before header matches are attempted -- unless the _keep_html_
parameter was enabled.
_Depth_ and _Count_ are more specific ways to specify tables in relation to
one another. _Depth_ represents how deeply a table resides in other tables.
The depth of a top-level table in the document is 0. A table within a
top-level table has a depth of 1, and so on. Each depth can be thought of
as a layer; tables sharing the same depth are on the same layer. Within
each of these layers, _Count_ represents the order in which a table was
seen at that depth, starting with 0. Providing both a _depth_ and a _count_
will uniquely specify a table within a document.
_Attributes_ match based on the attributes of the html