.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals" .SH NAME html2markdown \- converts HTML to markdown-formatted text .SH SYNOPSIS \fBhtml2markdown\fR [\fIpandoc\-options\fR] [\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR] .SH DESCRIPTION \fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text from STDIN) from HTML to markdown\-formatted plain text. If a URL is specified, \fBhtml2markdown\fR uses an available program (e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent to STDOUT unless an output file is specified using the \fB\-o\fR option. .PP \fBhtml2markdown\fR uses the character encoding specified in the "Content-type" meta tag. If this is not present, or if input comes from STDIN, UTF-8 is assumed. A character encoding may be specified explicitly using the \fB\-e\fR special option. .SH OPTIONS .PP \fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of \fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for a complete list. The following options are most relevant: .TP .B \-s, \-\-standalone Include title, author, and date information (if present) at the top of markdown output. .TP .B \-o FILE, \-\-output=FILE Write output to \fIFILE\fR instead of STDOUT. .TP .B \-\-strict Use strict markdown syntax, with no extensions or variants. .TP .TP .B \-R, \-\-parse-raw Parse untranslatable HTML codes as raw HTML. .TP .B \-H \fIFILE\fB, \-\-include-in-header=\fIFILE\fB Include contents of \fIFILE\fR at the end of the header. Implies \fB\-s\fR. .TP .B \-B \fIFILE\fB, \-\-include-before-body=\fIFILE\fB Include contents of \fIFILE\fR at the beginning of the document body. .TP .B \-A \fIFILE\fB, \-\-include-after-body=\fIFILE\fB Include contents of \fIFILE\fR at the end of the document body. .TP .B \-C \fIFILE\fB, \-\-custom-header=\fIFILE\fB Use contents of \fIFILE\fR as the document header (overriding the default header, which can be printed using '\fBpandoc \-D markdown\fR'). Implies \fB-s\fR. .SH "SPECIAL OPTIONS" .PP In addition, the following special options may be used. The special options must be separated from the \fBhtml2markdown\fR command and any regular \fBpandoc\fR options by the delimiter `\-\-', as in .IP .B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1 .B www.foo.com .TP .B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR Assume the character encoding \fIencoding\fR in reading HTML. (Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of available encodings may be obtained using `\fBiconv \-l\fR'.) If this option is not specified and input is not from STDIN, \fBhtml2markdown\fR will try to extract the character encoding from the "Content-type" meta tag. If no character encoding is specified in this way, or if input is from STDIN, UTF-8 will be assumed. .TP .B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR Use \fIcommand\fR to fetch the contents of a URL. (By default, \fBhtml2markdown\fR searches for an available program or text-based browser to fetch the contents of a URL.) .SH "SEE ALSO" \fBpandoc\fR(1), \fBiconv\fR(1) .SH AUTHOR John MacFarlane and Recai Oktas