Read-only operations:
- Frequent:
- Front page display.
- Article display.
- Viewing comments (expanding bubbles).
- Less frequent:
- Request comment content for editing window.
- Rare:
- View last valid parent.
Write actions:
- Posting comment.
- Editing comment.
- Approving changed parent for comment.
- Deleting comment,
Rendering webmored text
This is what performed by webmore_text (/apps/webmore/templatetags/webmore.py) template tag, and required by almost any request.
Database queries
Tag is given Part object as argument. It calls children_grouped_by_anchors method of Part, which returns dictionary (anchor => list of comments).
SELECT *, (EXISTS(SELECT * FROM front_validparent WHERE
front_validparent.part_id = front_part.id AND
front_validparent.valid_from <= ''parent version'' AND
front_validparent.valid_to >= ''parent version'')) AS approved
FROM front_part WHERE
(latest = True AND parent_id = ''parent'') ORDER BY date_created ASC;
Then starts a procedure of bubble linearization, in which up to N (currently 10) nested comments shown for each anchor. See Part.linearize. In this process for each bubble that appears in linearized view, executed the same query as shown above, but with LIMIT N addition, where N is the number of bubbles remaining to the per anchor limit.
So a total of 1 + (number of bubbles displayed in text) queries executed. This is by far the most complex SELECT executed in system, but most of them are executed in 1-2 ms on test database with 100,000+ entries. EXPLAIN output:
| id | select_type | table | type | key | Extra |
| 1 | PRIMARY | front_part | index_merge | front_part_parent_id,front_part_latest | Using intersect(front_part_parent_id,front_part_latest); Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | front_validparent | ref | front_validparent_part_id | Using where |
Measured over a large number of generated articles, execution time of children_grouped_by_anchors is linear to the total number of comments returned, with average time per bubble 3 ms.
Parsing webmored text
For rendering text should be converted to Python data structure (accessible as Part.tree property): list of dictionaries, each describing either text sentence, XHTML tag or webmore anchor tag. This task is performed by Expat XML parser (/lib/webmore/expat_converter.py), calling sentence splitter (/lib/text/splitter.py).
Parsing time is linear to the length of text (no spikes were detected), measured average parsing speed is 70 KB/s.
Profiling has shown:
- 80-85% of total parsing time is spent is SentenceSplitter.split.
- From this sentence splitting, 80% of time spent in remove_false_end_of_sentence function, which involves a lot of re.sub operations, and thus string copying.
- Also 25% of splitting time spent in compiling regular expression (re.compile).
TODO: precompile regular expressions and merge some patterns.
However, experiments have shown that it's hard to achieve even 2 times speedup on this way, so caching is absolutely required.
Using memcached cache backend, Part.tree access time in case of cache hit is reduced to a hard to measure 1-2 ms. Cache is invalidated on post_save signal of Part model.
Serialized tree typically consumes 2 times more memory than plain text (in case we're going to estimate memory amount required for memcached).
Template rendering
Rendering webmore is a fairly complex process (see /templates/webmore/text.html) involving nested loops, conditionals, and so on, which are known to be not so blazingly fast. It is very tricky to find out how much time is spent in what part of template.
For commentless articles: rendering time is linear to the length of text, average 150 KB/sec (measured for content length, not resulting HTML).
It's hard to establish a correlation between rendering time and number of comments, although it seems to be linear too.
For heavily commented text template rendering takes from 1/2 to 2/3 of total execution time, including database queries. These figures were measured with hotshot profiler, inserted into mod_python processing chain. However, accurate timing of rendering process isolated from HTTP server burden shows it takes 35-45% of (database requests + template rendering).
Caching
Webmore rendering complexity is inevitable. No tweaks will give us major speedup, but 2-3 reqs/sec for heavily commented (also likely to be frequently visited) articles is inacceptable. The only way to go is to cache entire rendered XHTML.
Two question arise: what to cache and when to invalidate, and the latter one is probably more confusing. On a highly interactive site, such as Thiblo, we cannot allow even a few minutes delay in article/comments view refreshing. That means: no cache timeouts, only event-driven invalidation. But webmore appearance of some part depends not only on this part itself, but also on its descendants.
We can somehow precompute and store, which bubbles appear on which parts, and notify parents appropriately. But invalidating whole article because of a single comment probably reduces caching benefits too much. Most probably we'll need to assemble content from cached parts (don't confuse with Part model), treating comments and anchors separately.
Another viable strategy is to show highly dynamic content only to logged-in users, and use timeout invalidation for anonymous users.
Displaying article
Viewing article is basically showing rendered webmore, described above, but with a few extra steps involved:
- Querying article itself
SELECT * FROM front_article WHERE name = ''article slug'' AND date_publshed = ''date from url'';
- Authentication engine queries, in case user is logged in:
-
SELECT * FROM django_session WHERE session_key = ''cookie'' AND expire_data = ''current timestamp'';
- SELECT * FROM auth_user WHERE id = id;
- SELECT * FROM auth_message WHERE user_id = id;
- SELECT * FROM account_userprofile WHERE user_id = id;
-
- Querying article content: SELECT * FROM front_part WHERE id = article content id;
All these steps are very primitive, and introduce no noticeable overhead.
Actual performance
Assuming that content was pre-parsed and cached, otherwise you should add parsing time as described above.
Measured with Apache benchmark, one parameter seems to be rather stable in various scenarios: transfer speed. That is, response time is proportional to the length of HTML returned. 500±50 KB/s was measured.
That means, for 10 KB original text article 10-11 reqs/s are typical, and for heavily commented (~100 of visible bubbles) article of the same length 2.5-3 reqs/sec.
The same article:
- on my development machine: 15.2 reqs/sec
- on our slicehost server: 16.9 reqs/sec
Two examples of request time distribution (per 300 requests):
Not commented, 11 KB length:
50% 98 66% 98 75% 99 80% 100 90% 110 95% 147 98% 156 99% 157 100% 178 (longest request)
13 KB length, 71 comments, 115 visible bubbles:
50% 348 66% 350 75% 353 80% 358 90% 441 95% 520 98% 721 99% 727 100% 1405 (longest request)
Front page display
Queries
- Blog selection by name.
- Boxes selection by blog id.
- For each box:
- For each tag in box query:
- Query articles
SELECT * FROM front_article LEFT OUTER JOIN front_tag_articles AS m2m_front_article__tag ON front_article.id = m2m_front_article__tag.article_id INNER JOIN front_tag AS front_article__tag ON m2m_front_article__tag.tag_id = front_article__tag.id WHERE (front_article.date_published IS NOT NULL AND front_article__tag.name = ''tag name'')For each article content is retrieved by separate simple SELECT.
- Query articles
- For each tag in box query:
- Rendering: retrieved content is sent to pre-built blog template stored in file system. Most of processing time is rendering webmored text for each article.
TODO: currently system requests all comments for each article shown on frontpage. Change this to query only anchors in abstract/frontquote part.
Detailed profiling shows that percentage of time spent in database queries is even lower than for articles, and rendering tooks most of it.
Unexpectedly, about 50 ms of execution time is spend in LastVisitedMiddleware (/apps/account/middleware.py), which calls costly URL resolver.
Performance data:
For /blog/alapos/ after default initial data insertion: 8-9 reqs/sec
50% 105 66% 105 75% 112 80% 117 90% 147 95% 156 98% 480 99% 482 100% 482 (longest request)
Viewing comments
General considerations
Not only this tends to be one of the most frequent operations, also becuase of its AJAX nature, it should be ideally performed lightning fast, with minimum visible delay. I estimate upper bound for comfortable webmore usage as 0.3 sec.
Sending little data chuncks is bad. Probably server should return not only bubbles for requested anchor, but also 2-3 levels deeper (but no more than N KB). These bubbles should be cached by client javascript with a few minutes timeout.
Operations
- Server is sent POST request at (/articles/anchor/) with the following data:
- parent part id
- anchor
- comma-separated list of anchors to show expanded in tree
- Requests:
- First, parent part is queried.
- The same bubble querying and linearization procedure is performed, as described above, but children are filtered by requested anchor on the first level.
- The same authentication queries are executed as in article view.
- Answer is rendered. Same considerations apply.
It's worth noting, that entire anchor tree is rendered to response, so the deeper is comment, the more work is required.
Actual performance
Short (a few words) top-level comment, with no nested comments: 70 reqs/sec for anonymous user, 60 reqs/sec for logged-in user.
The same anchor tree:
- on my development machine: 69 reqs/sec
- on our slicehost server: 78 reqs/sec
Time distribution for anonymous request (500 times):
50% 12 66% 12 75% 13 80% 13 90% 20 95% 23 98% 25 99% 25 100% 39 (longest request)
3-level deep comments, total 5 expanded bubbles shown: 15-17 reqs/sec, difference between anonymous and logged-in user is within measurement errors.
Request comment content for editing
POST data with comment id is sent to /articles/comments/edit/.
Extremely simple operation, requires two queries to authentication engine (session key, and user object), and one query for actually retrieving text. Takes just a few milliseconds.
Viewing last valid parent
Comment id is passed in URL. Queries:
- Query comment.
- Query last valid parent (3 simple queries, see Part.last_valid_parent).
- In case user is logged in, four authentication queries, same as for articles).
- Children query for rendering. Because only one bubble is shown, only one query is always issued.
Response time should be estimated from parent size, as if it has no comments.
Versioning overhead
Not detected. Regardless of how many versions of articles/comments were created, response times are roughly the same, within error range.
Posting comment
- System recieves following data in POST (/articles/comments/post/):
- id of part to comment (part)
- index of sentence to comment (index)
- text of sentence to comment (text)
- text of comment (comment)
- Optionally, if commenting comment, not article:
- root anchor of tree to recieve comment (root_anchor)
- id of article (root_parent)
- comma-separated list of anchors expanded on the way to comment (expanded)
- Authentication queries performed.
- Commented part requested by id.
- default_journal value requested (currently hardcoded "alapos")
- Received sentence text is checked against stored in database. Sentence retrieveal is linear to the length of article (see Part.sentences property), yet quite fast. Comment rejected on mismatch.
- Comment text sanitized (see separate section).
- New anchor requested (Part.anchor_for). This process is rather costly. First, it requires access to tree property (see above about its caching). Then, it scans linearly for sentence with index requested. Finally, if no anchor for this sentence was found, it generates a new one, inserts it into tree, and assembles part content from tree. Time for isolated anchor_for operation was found to be linear to the number of sentences, with average 0.5 ms per sentence. If anchor exists, time is greatly reduced.
- Article referenced in part.article requested by id. This seems to be pretty much redundant step performed by Django ORM.
- New global version created:
INSERT INTO version(id) VALUES (NULL); SELECT LAST_INSERT_ID();
- New comment is inserted.
- PartManager intercepts it, and updates part_id field:
SELECT 1 FROM front_part WHERE id = newly_created; (???) UPDATE front_part SET part_id = newly_created WHERE id = newly_created;
- Updated text of commented part is stored in database (UPDATE query).
- Newly created comment is approved (see below for details).
- Webmore rendered for root anchor tree (see above).
All in all, for top-level posting average 5 reqs/sec was measured, although time depends on too many factors, so this is very rough estimation.
TODO: optimize anchor generation.
Editing comment
- Following data is passed in POST request to (/articles/comments/update/):
- Comment id.
- Updated content.
- Parent id.
- Comment anchor.
- List of expanded anchors (see explanation in comment posting section, although last three params seem to be not used on server side, and appear only because of generalized client-side process.)
- Authentication queries performed.
- Edited comment requested by id.
- Anchors of its descendants queried for further usage in webmore sanitizing.
- Webmore sanitized. If some anchors were deleted, new anchor with special reattached=1 attribute is created at the end of comment, and all deleted descendants are updated to have this anchor.
- New version created.
- Comment content updated.
- Old version inserted.
- References to old version in validity table updated.
UPDATE front_validparent SET part_id=new_id WHERE part_id=old_id
- Response webmore rendered.
Once again, too many factors involved, but measurements for top-level comment editing show about 5 reqs/sec
Webmore sanitizing
It is performed by Expat-based XHTML parser, which checks XHTML validity, strips out invalid tags and <webmore /> tags with invalid anchors, and returns list of deleted anchors.
Mesured speed is linear to the length of content (although some spikes were detected, probably due to very short times), and is about 3.5 MB/sec.
Approving comment
- System receives id of comment to approve in POST request.
- Comment requested by id.
- Authentication queries.
- Approval itself:
- Request parent by id.
- Request last outdated version of parent:
SELECT * FROM front_part WHERE (part_id = parent_id AND latest = False) ORDER BY version DESC LIMIT 1
- Request if it was approved for this particular comment:
SELECT COUNT(*) FROM front_validparent WHERE part_id = comment_id AND valid_to >= version AND valid_from <= version
- If yes:
- Request entry to update
SELECT * FROM front_validparent WHERE part_id = comment_id AND valid_to >= version AND valid_from <= version
(???)SELECT 1 FROM front_validparent WHERE id=entry_id
- Update it
UPDATE front_validparent SET valid_to = new_version WHERE id=entry_id
- Request entry to update
- Else insert new entry:
INSERT INTO front_validparent VALUES (...)
- If yes:
- Rendered anchor tree is returned, see above.
Overall performance is limited by response webmore rendering. Approval queries are executed in a few milliseconds.
TODO: Django ORM seems to issue duplicate query in the first scenario (update existing validity entry). Not too much to care about, however, maybe will rewrite somewhere in future.
Comment deletion
Server recieves comment id to delete in POST request and, well, deletes it :-)
Measurements configuration
Software
Ubuntu 7.04, Apache 2.2.3, MySQL 5.0.38, Python 2.5.1, mecached 1.1.12,
Hardware
Athlon 64 3200+ with 1 GB of RAM
