Releases 0.9.181 - 0.9.185: Change Log

We've fixed CSV downloads for gift exchange sign-ups, corrected fandom counts in collections, and made a whole heap of behind-the-scenes changes, test improvements, and other minor fixes.


  • Coders: Ariana, Cesy, cosette, cyrilcee, David Stump (Littlelines), DNA, james_, potatoesque, redsummernight, Sammie Louise, Sarken, Scott, tickinginstant
  • Code reviewers: Ariana, bingeling, james_, Naomi, potatoesque, redsummernight, Sarken
  • Testers: Betsy, Lady Oscar, mumble, Rebecca Sentance, redsummernight, Runt, Sammie Louise

Special thanks to redsummernight, who has contributed their first pull request as an AD&T volunteer and completed their training!


Bug Fixes & Enhancements

  • [AO3-4844] - We've started using the Devise gem to handle admin logins.
  • [AO3-4834] & [AO3-4835] - In our tag set code, we had two places where users would get a 500 error instead of the nicer, more specific message we meant to give them. Now they'll get a "What Tag Set did you want to look at?" error instead.
  • [AO3-4877] - Following a recent release, it was no longer possible to download gift exchange sign-up CSVs. We've fixed that, and we've also added some tests that will hopefully keep it from happening again.
  • [AO3-4808] - Editing a work and removing its fandom used to save the work, but return a 500 error, resulting in an invalid work and a confused user. Trying to save a work without a fandom will now not save the work and show the user an error message instead.
  • [AO3-4045] - If your chapter was over 500,000 characters long, you'd get an error message that included the oh-so-helpful suggestion, "Maybe you want to create a multi-chapered work?" Since you were already trying to do that, we removed that from the error message.
  • [AO3-2431] - A lot of collections were showing fandom counts that were higher than the actual number of fandoms in that collection. We realized that was because the code was also counting meta tags, so we made it stop doing that.
  • [AO3-4858] & [AO3-4922] - As detailed in Issues With Posting Works (And What We're Doing to Solve Them), we deployed some new caching code to help speed up work posting. Unfortunately, the code didn't work and we had to revert it.


  • [AO3-4883] - A security vulnerability was discovered for one of the gems we use, so we quickly updated to the patched version. (We only use the gem for our automated tests and don't believe we were at risk, but better safe than sorry!)
  • [AO3-4895] - The tool we use to check our code style and syntax was giving us suggestions that only worked in a newer version of the Ruby language than what we're currently using. We changed the tool's settings so it will only suggest things for the version of Ruby we're using.
  • [AO3-4780] & [AO3-4782] - We've added strong parameters to FAQ categories and invitation requests.
  • [AO3-4918] & [AO3-4920] - In order to deploy the caching changes for AO3-4858, we temporarily amended our deploy script so the deploy process would take less time, but require us to briefly put the Archive into maintenance mode. After we were done, we reverted those changes.
  • [AO3-4825] - We had some help files that were outdated and no longer in use, so we removed them.
  • [AO3-4851] & [AO3-4933] - We updated the database schema file in our repository, since recent changes to our database structure meant it was out of date.
  • [AO3-4443] - We've updated our version of Pry, a gem that provides a number of development tools.
  • [AO3-4856] - We had some unused code in the tag set nominations controller, so we deleted it.


  • [AO3-4830], [AO3-4897], [AO3-4908], [AO3-4901] - We've extended the automated tests for tag sets to cover more lines in the controller and more use cases. We've also reorganized the tests into smaller files in their own directory.
  • [AO3-4726] - We've brought test coverage of the comments controller up from 71% to almost 94%.
  • [AO3-4914] - We now have tests to cover all the types of tags you can use on a bookmark of an external work.
  • [AO3-4887] - We've begun improving the test coverage of the challenge assignments controller.
  • [AO3-4810] - Our test coverage for the prompts controller is now at 93%, which is much better than the 65% it started at.
  • [AO3-4889] - The series controller now has 96% of its lines covered by automated tests.
  • [AO3-4916] - We've added more tests for the external authors controller.

Known Issues

See our Known Issues page for current issues.

We're Back on the Wayback Machine

We're pleased to announce that after seven months, the Archive of Our Own is once again available on the Internet Archive's Wayback Machine!

Late last year, the AO3 suddenly vanished from the Wayback Machine, a non-profit archiving service. We reached out to its maintainers several times during this period, to find out why AO3 pages weren't archived anymore. The project's director contacted us this week and explained the problem.

Rather than excluding only pages belonging to users who had asked for their content to be taken down (e.g. their profile page or specific works), the entire domain had been mistakenly excluded. The folks at the Wayback Machine have corrected this problem and the AO3 is available there once more. (Check out the Archive homepage from 2010!)

While the Wayback Machine is a great service, and another useful tool in the efforts to preserve fanworks and fan history, this is a good reminder not to keep all your eggs in the same basket. Download works you might want to read again in a year, crosspost your own works to other sites, and be sure you save back-ups locally and/or with a trusted online service.

If you're concerned about the public availability of your works, check our "How can I hide my works from non-Archive users?" FAQ for information that can help protect your privacy.

The OTW is Recruiting Abuse and Elections Staff!

OTW recruitment banner

Are you interested in volunteering for the Organization for Transformative Works as Abuse Committee Staff or Elections Committee Staff?

We would like to thank everyone who responded to our previous call for Tag Wrangling Volunteers and Fanlore Staff.

Today, we're excited to announce the opening of applications for:

  • Abuse Committee Staff - closing 29 March 2017 23:59 UTC
  • Elections Committee Staff: Team Coordinator - closing 29 March 2017 23:59 UTC
  • Elections Committee Staff: Voting Process Architect - closing 29 March 2017 23:59 UTC

We have included more information on each role below. Open roles and applications will always be available at the volunteering page. If you don't see a role that fits with your skills and interests now, keep an eye on the listings. We plan to put up new applications every few weeks, and we will also publicize new roles as they become available.

All applications generate a confirmation page and an auto-reply to your e-mail address. We encourage you to read the confirmation page and to whitelist volunteers -(at)- transformativeworks -(dot)- org in your e-mail client. If you do not receive the auto-reply within 24 hours, please check your spam filters and then contact us.

If you have questions regarding volunteering for the OTW, check out our Volunteering FAQ.

Abuse Committee Staff

The Abuse Committee is dedicated to helping users deal with the various situations that may arise. We also handle any complaints that come in about content uploaded to the Archive of Our Own. The team determines if complaints are about legitimate violations of the Terms of Service, and what to do about them if they are; our major goals are to adhere to the TOS, to make our reasoning and processes as clear and transparent as possible, and to keep every individual case completely confidential. We work closely with other AO3 related committees such as Support and Content.
We are seeking people who can keep in close contact, be patient in rephrasing explanations, make and document decisions, cooperate within and outside of their team, and ask for help when it's needed. Staffers need to be able to handle complex and sometimes-disturbing content, and must be able to commit a sufficient amount of time to the team on a regular basis.

Applications are due Wednesday 29 March 2017 23:59 UTC

Elections Committee Staff: Team Coordinator

The Elections Committee is responsible for running OTW Board elections. We ensure the fairness, timeliness, and confidentiality of the process. As a team, we update the elections process, communicate with members and other committees about the process, help candidates prepare for and carry out their tasks, and run the election itself.

We are currently looking for Team Coordinators to organize our efforts and document procedures.

Applications are due Wednesday 29 March 2017 23:59 UTC

Elections Committee Staff: Voting Process Architect

The Elections Committee is responsible for running OTW Board elections. We ensure the fairness, timeliness, and confidentiality of the process. As a team, we update the elections process, communicate with members and other committees about the process, help candidates prepare for and carry out their tasks, and run the election itself.

We are currently looking for Voting Process Architects to run the election itself and preserve our data security.

Applications are due Wednesday 29 March 2017 23:59 UTC

Apply at the volunteering page!

Issues With Posting Works (And What We're Doing to Solve Them)

You may have noticed the Archive has been slow or giving 502 errors when posting or editing works, particularly on weekends and during other popular posting times. Our development and Systems teams have been working to address this issue, but our March 17 attempt failed, leading to several hours of downtime and site-wide slowness.


Whenever a user posts or edits a work, the Archive updates how many times each tag on the work has been used across the site. During this time, the record is locked and the database cannot process other changes to those tags. This can result in slowness or even 502 errors when multiple people are trying to post using the same tag. Because all works are required to use rating and warning tags, works' tags frequently overlap during busy posting times.

Unfortunately, the only workaround currently available is to avoid posting, editing, or adding chapters to works at peak times, particularly Saturdays and Sundays (UTC). We strongly recommend saving your work elsewhere so changes won’t be lost if you receive a 502.

For several weeks, we’ve had temporary measures in place to decrease the number of 502 errors. However, posting is still slow and errors are still occurring, so we’ve been looking for more ways to use hardware and software to speed up the posting process.

Our Friday, March 17, downtime was scheduled so we could deploy a code change we hoped would help. The change would have allowed us to cache tag counts for large tags (e.g. ratings, common genres, and popular fandoms), updating them only periodically rather than every time a work was posted or edited. (We chose to cache only large tags because the difference between 1,456 and 1,464 is less significant than the difference between one and nine.) However, the change led to roughly nine hours of instability and slowness and had to be rolled back.

Fixing this is our top priority, and we are continuing to look for solutions. Meanwhile, we’re updating our version of the Rails framework, which is responsible for the slow counting process. While we don’t believe this upgrade will be a solution by itself, we are optimistic it will give us a slight performance boost.

March 17 incident report

The code deployed on March 17 allowed us to set a caching period for a tag’s use count based on the size of the tag. While the caching period and tag sizes were adjusted throughout the day, the code used the following settings when it was deployed:

  • Small tags with less than 1,000 uses would not be cached.
  • Medium tags with 1,000-39,999 uses would be cached for 3-40 minutes, depending on the tag’s size.
  • Large tags with at least 40,000 uses would be cached for 40-60 minutes, but the cache would be refreshed every 30 minutes. Unlike small and medium tags, the counts for large tags would not update when a work was posted -- they would only update during browsing. Refreshing the cache every 30 minutes would prevent pages from loading slowly.

We chose to deploy at a time of light system load so we would be able to fine tune these settings before the heaviest weekend load. The deploy process itself went smoothly, beginning at 12:00 UTC and ending at 12:14 -- well within the 30 minutes we allotted for downtime.

By 12:40, we were under heavy load and had to restart one of our databases. We also updated the settings for the new code so tags with 250 or more uses would fall into the “medium” range and be cached. We increased the minimum caching period for medium tags from three minutes to 10.

At 12:50, we could see we had too many writes going to the database. To stabilize the site, we made it so only two out of seven servers were writing cache counts to the database.

However, at 13:15, the number of writes overwhelmed MySQL. It was constantly writing, making the service unavailable and eventually crashing. We put the Archive into maintenance mode and began a full MySQL cluster restart. Because the writes had exceeded the databases' capabilities, the databases had become out of sync with each other. Resynchronizing the first two servers by the built-in method took about 65 minutes, starting at 13:25 and completing at 14:30. Using a different method to bring the third recalcitrant server into line allowed us to return the system to use sooner.

By 14:57, we had a working set of two out of three MySQL servers in a cluster and were able to bring the Archive back online. Before bringing the site back, we also updated the code for the tag autocomplete, replacing a call that could write to the database with a simple read instead.

At 17:48, we were able to bring the last MySQL server back and rebalance the load across all three servers. However, the database dealing with writes was sitting at 91% load rather than the more normal 4-6%.

At 18:07, we made it so only one app server wrote tags’ cache values to the database. This dropped the load on the write database to about 50%.

At 19:40, we began implementing a hotfix that significantly reduced writes to the database server, but having all seven systems writing to the database once more put the load up to about 89%.

At 20:30, approximately half an hour after the hotfix was finished, we removed the writes from three of the seven machines. While this reduced the load, the reduction was not significant enough to resolve the issues the Archive was experiencing. Nevertheless, we let the system run for 30 minutes so we could monitor its performance.

Finally, at 21:07, we decided to take the Archive offline and revert the release. The Archive was back up and running the old code by 21:25.

We believe the issues with this caching change were caused by underestimating the number of small tags on the Archive and overestimating the accuracy of their existing counts. With the new code in place, the Archive began correcting the inaccurate counts for small tags, leading to many more writes than we anticipated. If we're able to get these writes under control, we believe this code might still be a viable solution. Unfortunately, this is made difficult by the fact we can’t simulate production-level load on our testing environment.

Going forward

We are currently considering five possible ways to improve posting speed going forward, although other options might present themselves as we continue to study the situation.

  1. Continue with the caching approach from our March 17 deploy. Although we chose to revert the code due to the downtime it had already caused, we believe we were close to resolving the issue with database writes. We discovered that the writes overwhelming our database were largely secondary writes caused by our tag sweeper. These secondary writes could likely be reduced by putting checks in the sweeper to prevent unnecessary updates to tag counts.
  2. Use the rollout gem to alternate between the current code and the code from our March 17 deploy. This would allow us to deploy and troubleshoot the new caching code with minimal interruption to normal Archive function. We would be able to study the load caused by the new code while being able to switch back to the old code before problems arose. However, it would also make the new code much more complex. This means the code would not only be more error-prone, but would also take a while to write, and users would have to put up with the 502 errors longer.
  3. Monkey patch the Rails code that updates tag counts. We could modify the default Rails code so it would still update the count for small tags, but not even try to update the count on large tags. We could then add a task that would periodically update the count on larger tags.
  4. Break work posting into smaller transactions. The current slowness comes from large transactions that are live for too long. Breaking the posting process into smaller parts would resolve that, but we would then run the risk of creating inconsistencies in the database. In other words, if something went wrong while a user was updating their work, only some of their changes might be saved.
  5. Completely redesign work posting. We currently have about 19,000 drafts and 95,000 works created in a month, and moving drafts to a separate table would allow us to only update the tag counts when a work was finally posted. We could then make posting from a draft the only option. Pressing the "Post" button on a draft would set a flag on the entry in the draft table and add a Resque job to post the work, allowing us to serialize updates to tag counts. Because the user would only be making a minor change in the database, the web page would return instantly. However, there would be a wait before the work was actually posted.
  6. The unexpected downtime that occurred around noon UTC on Tuesday, March 21, was caused by an unusually high number of requests to Elasticsearch and is unrelated to the issues discussed in this post. A temporary fix is in currently in place and we are looking for long term solutions.