1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
|
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
.SH NAME
html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS
\fBhtml2markdown\fR [\fIpandoc\-options\fR]
[\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text.
If a URL is specified, \fBhtml2markdown\fR uses an available program
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
to STDOUT unless an output file is specified using the \fB\-o\fR
option.
.PP
\fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag. If this is not present, or if input comes
from STDIN, UTF-8 is assumed. A character encoding may be specified
explicitly using the \fB\-e\fR special option.
.SH OPTIONS
.PP
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of
\fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for
a complete list. The following options are most relevant:
.TP
.B \-s, \-\-standalone
Include title, author, and date information (if present) at the
top of markdown output.
.TP
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR instead of STDOUT.
.TP
.B \-\-strict
Use strict markdown syntax, with no extensions or variants.
.TP
.B \-\-reference\-links
Use reference-style links, rather than inline links, in writing markdown
or reStructuredText.
.TP
.B \-R, \-\-parse-raw
Parse untranslatable HTML codes as raw HTML.
.TP
.B \-H \fIFILE\fB, \-\-include-in-header=\fIFILE\fB
Include contents of \fIFILE\fR at the end of the header. Implies
\fB\-s\fR.
.TP
.B \-B \fIFILE\fB, \-\-include-before-body=\fIFILE\fB
Include contents of \fIFILE\fR at the beginning of the document body.
.TP
.B \-A \fIFILE\fB, \-\-include-after-body=\fIFILE\fB
Include contents of \fIFILE\fR at the end of the document body.
.TP
.B \-C \fIFILE\fB, \-\-custom-header=\fIFILE\fB
Use contents of \fIFILE\fR
as the document header (overriding the default header, which can be
printed using '\fBpandoc \-D markdown\fR'). Implies
\fB-s\fR.
.SH "SPECIAL OPTIONS"
.PP
In addition, the following special options may be used. The special
options must be separated from the \fBhtml2markdown\fR command and any
regular \fBpandoc\fR options by the delimiter `\-\-', as in
.IP
.B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1
.B www.foo.com
.TP
.B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR
Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.)
If this option is not specified and input is not from
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag. If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be
assumed.
.TP
.B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL. (By default,
\fBhtml2markdown\fR searches for an available program or text-based
browser to fetch the contents of a URL.)
.SH "SEE ALSO"
\fBpandoc\fR(1),
\fBiconv\fR(1)
.SH AUTHOR
John MacFarlane and Recai Oktas
|