Commit Graph

74 Commits

Author SHA1 Message Date
Austin Ziegler 40f28ee022 Prevent generated header collisions, less naively.
> This is a rework of an earlier version of this code.

The automatic header ID generation code submitted in #125 has a subtle
bug where it will use the same ID for multiple headers with identical
text. In the case below, all the headers are rendered a `<h1
id="header">Header</h1>`.

  ```markdown
  # Header
  # Header
  # Header
  # Header
  ```

This change is a simple but robust approach that uses an incrementing
counter and pre-checking to prevent header collision. (The above would
be rendered as `header`, `header-1`, `header-2`, and `header-3`.) In
more complex cases, it will append a new counter suffix (`-1`), like so:

  ```markdown
  # Header
  # Header 1
  # Header
  # Header
  ```

This will generate `header`, `header-1`, `header-1-1`, and `header-1-2`.

This code has two additional changes over the prior version:

1.  Rather than reimplementing @shurcooL’s anchor sanitization code, I
    have imported it as from
    `github.com/shurcooL/go/github_flavored_markdown/sanitized_anchor_name`.

2.  The markdown block parser is now only interested in *generating* a
    sanitized anchor name, not with ensuring its uniqueness. That code
    has been moved to the HTML renderer. This means that if the HTML
    renderer is modified to identify all unique headers prior to
    rendering, the hackish nature of the collision detection can be
    eliminated.
2014-11-23 20:35:43 -05:00
Austin Ziegler 8cc40f8e07 Use supplied header ID for TOC rendering.
- Fixes #112 so that `#header {#header-id}` renders the TOC with
  `#header-id` instead of `#toc_1`.
2014-10-27 16:49:28 -04:00
Vytautas Saltenis cf6bfc9d6d Rip off all blackfriday's html sanitization effort
As per discussion in issue #90.
2014-09-19 21:25:23 +03:00
tummychow 67002b01b6 Use HTML5 recommended style of language on code blocks
For code blocks that contain a certain language of code, the recommended
attribute structure is <pre><code class="language-foo">. This also
corresponds to the behavior expected by various JS syntax highlighters.

The GitHub code block implementation was obsolete, and identical to the
normal implementation except for its attribute structure, so it was
removed.

Closes #108.
2014-08-28 18:01:06 -04:00
Brian Goff 539b27a624 Add titleblock support 2014-08-04 14:08:22 -04:00
Daniel Imfeld 5bf00efe39 Remove unnecessary HTML_ABSOLUTE_LINKS flag 2014-05-29 09:17:20 -05:00
Daniel Imfeld 10f1dc6358 Fix spelling error 2014-05-28 23:52:45 -05:00
Daniel Imfeld 628c02d37b Move footnote prefix to a better place 2014-05-24 14:28:37 -05:00
Daniel Imfeld c7f4b178c2 Use parameters object for extra options. Enhance footnote support.
Option to add return links.
Option to make footnote prefixes unique, for rendering multiple
documents per page.
2014-05-24 13:29:39 -05:00
Daniel Imfeld ec41294bc4 Add footnote prefix option. Needs testing 2014-05-24 02:55:13 -05:00
Daniel Imfeld 5c12499aa1 Add ability to convert relative links to absolute 2014-05-18 01:28:15 -05:00
Vytautas Šaltenis 3dba5bc56e Merge branch 'master' of github.com:gihnius/blackfriday into gihnius-master
Conflicts:
	html.go
	inline_test.go
2014-05-01 21:43:42 +03:00
Martin Probst 41251715ad Use go.net/html's parser to sanitize HTML.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
2014-04-27 23:40:44 +02:00
willnix be9cbc634a tagWhitelist allows alignment attribute now
This is the closest I could get to removing everything "unsave" without introducing an additional regex.
2014-04-19 21:59:04 +00:00
willnix c1e4996787 Add table tags to the whitelist.
Fixing:
55cd82008e

This commit introduced a html tag whitelist which does not include any table tags (<td>,<tr>,<thead>...). Therefore even tables the markdown parser itself generated will be removed.
2014-04-17 15:44:40 +00:00
Vytautas Šaltenis c5ece173ad Merge pull request #59 from johnsto/master
Header ID specifiers
2014-04-11 21:31:27 +03:00
Dave Johnston 2dff0864f0 Add header ID support and tests: # Header {#myid} 2014-04-05 20:42:58 +01:00
Kjetil Mehl 786aed6213 Explicit return byte array at end of function. 2014-04-05 16:59:28 +02:00
Vytautas Šaltenis 55bb56bf9b Merge pull request #55 from rtfb/master
Autolink fixes
2014-03-30 19:58:39 +03:00
Vytautas Šaltenis d643453f1e Merge pull request #50 from rtfb/master
Better protection against JavaScript injection
2014-03-30 19:52:13 +03:00
gihnius 93484b1424 add nofollow ref for non internal links only 2014-03-21 11:14:58 +08:00
gihnius ecf59d4a55 add target blank attr 2014-03-21 10:52:46 +08:00
Graham Miller d71c759108 add HTML_NOFOLLOW_LINKS 2014-02-25 09:21:57 -05:00
Vytautas Šaltenis b0bdfbec4c Fix bug in autolink overescaping html entities
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
    &amp;  --> &amp;amp;
    &quot; --> &amp;quot;
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis cc0d56d092 Extract a chain of ifs into separate func
This gives a ~10% slowdown of a full test run, which is tolerable.
Switch statement is still slightly slower (~5%). Using map turned out to
be unacceptably slow (~3x slowdown).
2014-02-17 21:09:04 +02:00
Vytautas Šaltenis 31a96c6ce7 go fmt 2014-02-17 21:09:03 +02:00
Vytautas Šaltenis 2f50a53f8e Rename HTML_SKIP_SCRIPT to HTML_SANITIZE_OUTPUT 2014-01-22 01:23:43 +02:00
Vytautas Šaltenis 55cd82008e Rewrite protection against JavaScript injection
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.

Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.

Stronger (but still incomplete) fix for #11.

[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
2014-01-22 01:14:35 +02:00
Vytautas Šaltenis e02c392dc6 Extract useful code to separate func 2014-01-22 00:45:43 +02:00
David Kitchen 6e6572e913 Added th to table headers so that styling with things like Twitter Bootstrap and typeset.css work as expected. Cells in headers should always be TH unless they are advisory cells within headers in which case TD is acceptable (but being Markdown a user with such needs could just enter HTML for this) 2013-10-16 11:36:33 +01:00
moshee c23099e5ee Implementation and some tests for inline footnotes. Also I noticed the list items had the wrong ids, that was silly of me. 2013-07-01 01:37:52 +00:00
moshee 7bdb82c53a new tests pass but old tests now fail... 2013-06-26 15:57:51 +00:00
moshee be082a1ef2 First attempt at supporting Pandoc-style footnotes. The existing tests have not broken but the new functionality does not work yet. 2013-06-25 01:18:47 +00:00
Vytautas Šaltenis 8226238289 Improve html element stripping code 2013-04-18 03:15:47 +03:00
Vytautas Šaltenis dcaaa9b5dc More <script> stripping
Partially addresses issue #11.
2013-04-13 23:24:30 +03:00
Vytautas Šaltenis fb923cdb78 Add an option to strip <script> elements
Partially addresses issue #11.
2013-04-13 22:57:16 +03:00
Vytautas Šaltenis b79e720a36 Make isHtmlTag() case insensitive 2013-04-13 22:34:37 +03:00
Vytautas Šaltenis a2fda5e98f Extract repetitive code to a func 2013-04-13 22:26:29 +03:00
Vytautas Šaltenis d5a8df164b Fix bug in isHtmlTag()
Fix what seems to be a typo. j should iterate through all tagname, so it
should be initialized to zero. The test exposes this bug.
2013-04-13 22:21:47 +03:00
Caleb Spare a25d9a543f Fix html tag ordering in doc string. 2012-11-22 12:52:56 -08:00
Caleb Spare d0d854958e Fix up method documentation formatting. 2012-11-22 12:12:08 -08:00
moshee 8a86b6d6be HTML5 doctype, Wrap TOC with <nav>
<nav> makes the TOC more easily identifiable and workable with CSS.
2012-10-21 21:23:44 -07:00
Russ Ross a5441fd99f updates for go 1 2012-03-07 21:36:31 -07:00
Russ Ross 530123dd9f additional doc comments 2011-07-07 12:05:29 -06:00
Russ Ross bb8ee591d1 doc improvements, commenting 2011-07-07 11:56:45 -06:00
Russ Ross bd60e3691b removing more redundant checks, additional cleanup of block parsing 2011-07-01 14:13:26 -06:00
Russ Ross 689f6cb79b more consistent spacing of block-level elements 2011-07-01 11:19:42 -06:00
Russ Ross ae9562f685 move whitespace stripping to parser, not renderers 2011-06-29 15:38:35 -06:00
Russ Ross d3c8225096 corner case spacing issue with table of contents 2011-06-29 13:24:15 -06:00
Russ Ross 2aca667078 simplify inline callback interface 2011-06-29 13:00:54 -06:00