Now Live! ScribbleHub Explorer – An AI-Powered Novel Recommendation Engine: Feedback Needed!

UnknownNovelist · Feb 16, 2023

K5Rakitan said:
How is it going to handle an outlier like mine?

I checked out your reading list, and it actually isn't that bad. I mean, I think the top user has around 3302 novels in their primary reading list.

K5Rakitan · Feb 16, 2023

UnknownNovelist said:
I checked out your reading list, and it actually isn't that bad. I mean, I think the top user has around 3302 novels in their primary reading list.

I was thinking more about how I've asked for one-star ratings on the story I wrote, and how will that be recommended to others with ratings factored into the system?

UnknownNovelist · Feb 16, 2023

Oh, then you're F***ed...

Kidding.

The CF Engine uses implicit feedback, i.e., what users have actually added to their reading list - not what they have rated that novel that they had added. This was a limitation of the initial dataset that I scraped using the browser plugin. The plan, however, was to use ratings in a future version - and if that happened, then your novel would surely drop to the bottom.

But don't get any hopes on the Content-based Engine, as that factors in synopsis, genre, tags, and yes, ratings. So unless you have some amazing other metrics on e.g., reviews, readers, and favorites to balance this out.

But remember, it works on finding clusters and similarities between novels - this is the primary measure on how it ranks - on how similar any given novel is to the input. The rating is only secondary. This means that if you can get your fellow authors to also have their readers give 1 star ratings then you won't be alone for long!

greyblob · Feb 16, 2023

UnknownNovelist said:
I'm planning on deploying it, but it will not be updated, as Tony has asked me to stop scraping.

Edit: To expand on my last comment. The project is *dead*. And I will not be scraping for the new dataset to further train the AI. In short, I'm back to square one and I don't know if I can continue the project.

I don't mind either way. I'd like to tinker with it and glance behind the curtains if possible.

I'm curious why didn't you take a much simpler approach and analyze tags, genres, and maybe even a few chapters of all novels. would've certainly been less costly than scraping every user data present.

UnknownNovelist · Feb 16, 2023

greyblob said:
I don't mind either way. I'd like to tinker with it and glance behind the curtains if possible.

Sure thing. The source code is for the streamlit app is available here: https://github.com/alexkahler/ScribbleHub-Explorer You will also find links to the actual AI (called RecSys) in a separate project. I have made the Spider private as SH is against scraping. I'd be willing to share the source code for the Spider with @Tony , so he can see the parameters for scraping, but not publicly.
Finally, the code is extremely bad. I haven't done any refactoring as it was still in early alpha-soon-to-be-beta. So I hope you can live through that

greyblob said:
I'm curious why didn't you take a much simpler approach and analyze tags, genres, and maybe even a few chapters of all novels. would've certainly been less costly than scraping every user data present.

It's due to the principles that I described earlier. I did not want to scrape any copyrighted material. Furthermore, even if I did scrape the content, it wouldn't have given me the User-Item relations that a Collaborative Filtering Recommendation System would need to calculate the similarity between users. The only way to do that is to get the data on what users have added to their reading lists.

melchi · Feb 16, 2023

Python? Dirty snake language :D

M.G.Driver · Feb 17, 2023

So this project is already dead right? Seems like SH doesn't want this.

UnknownNovelist · Feb 17, 2023

M.G.Driver said:
So this project is already dead right? Seems like SH doesn't want this.

Tony expressed interest in the project, but he wasn't happy about scraping the site. So that lands me in a kind of catch-22. I can't continue with the project if I can't scrape. I'm still in brainstorming mode to figure out a way to get users reading lists to train the AI and the latest novel information, but it has to be done in a way which doesn't involve scraping (or possibly SH server interaction)

Sabruness · Feb 17, 2023

interesting. how would this handle reading lists when a profile has multiple? would it aggregate them all as one or would there be flexibility for users to designate a particular list(s) to be used as datapoints?

UnknownNovelist · Feb 17, 2023

Sabruness said:
interesting. how would this handle reading lists when a profile has multiple? would it aggregate them all as one or would there be flexibility for users to designate a particular list(s) to be used as datapoints?

That was currently a design limitation. I made the assumption that the first reading list you see is your default reading list. But I did also notice that very few users had the first reading list with dropped novels. To combat this I had planned to use fuzzy match logic to identify the type of reading list that the Spider was looking at and to see if there were any better matches (for currently reading or favorites) amongst the other reading list choices on that given user profile.

UnknownNovelist · Feb 19, 2023

Hi everyone!

I'm back with an update: @Tony has given the OK to post the link to the web app I made!

You can find ScribbleHub Explorer here: https://sh-explorer.streamlit.app/

Please note that this is still an early beta, so expect crashes, bugs, and inconsistent behavior. Also, the dataset is from the end of January/early February, so it is not updated with the latest information. Also note, that the AI recommendations are not at its best due to the poor dataset that it was trained on.

Since @Tony asked me to stop scraping, I've been thinking of ways to get the required novel data and reading lists without scraping from ScribbleHub. Then, it struck me that I don't have to scrape ScribbleHub, since someone else is already doing so on ScribbleHub's request - Google! Every time Google visits ScribbleHub (which is quite often), they save a copy of the page that they visited and make it available as a cached page. This means I can simply adapt what I've made to crawl Google's cached pages instead of ScribbleHub.

The next problem is regarding users' reading lists. As I mentioned in my opening post, reading lists are fundamental in terms of training the AI. Although I have some rudimentary data that I can use, it is not optimal. Since I can no longer scrape ScribbleHub, I've thought of a few ways to get around this:

When users want a recommendation, they'll first have to enter their reading list. For users with many novels in the reading list, this might not be sustainable, so I thought of making it possible to copy-paste the reading list's RSS feed into an input box. However, the RSS feed is limited to 25 items, which means that novels that update less regularly won't be included.
Copy-paste the raw text of the reading list. The advantage here is that you'll get everything in the reading list, but it is also more error-prone. What if users do not copy-paste correctly? There are also instances of novels with duplicate titles, so you'll need to make a 1-to-1 comparison between a novel's title.
Copy-paste the page source. This would be a better option, but it's also dependent on the user's skill. Not many people would know how to correctly view the page source of their reading list.

Since I cannot directly retrieve the user's reading list, I'll also have to figure out a way to save the input data, which means opening up a whole other can of worms. I'll need to figure out user registration, how to save users' reading lists to a database, how users can sync or update their reading lists with ScribbleHub, and so on.

A much more preferred option would be if ScribbleHub could offer an RSS feed of a user's entire reading list (not the latest chapter release, but of which novels are on the reading list). This would make it easier to retrieve the latest novels.

Another way would be to get approval from Tony/ScribbleHub to access users' reading lists and retrieve the necessary data upon user request. This would be different from the broad web-scraping that I did before, as it would only request data from the ScribbleHub servers when users requested it.

The ideal scenario would be for Tony/ScribbleHub to make an API available - one where you could access a novel index, novel information, public users, and public reading lists. Of course, such an API should be protected with an API key. I don't know if this is at all feasible, and it would depend solely on Tony.

Please let me know what you think, and feel free to give me some feedback on the web app!

Edit: Changed the post title, as I'm now looking for feedback :)

Lire · Feb 23, 2023

I tried it out, and it seems pretty neat and easy to use.

Problem? I am seeing a surprising amount of recommendations that are... well, on HIATUS.

Still, this is pretty neat since I found some interesting novels because of it. I just have to cope with the fact that they'll be unfinished forever.

Corty · Feb 23, 2023

I tried it out too. As @Lire said, many of the recommendations showed that they are on hiatus. An option would be greatly preferred to exclude recommendations like those. Other than that, looks cool so far!

Edit:
A question came to mind. It may have been written down already, but I don't remember reading it.

Does it show the said novel is on hiatus, flagged by the author, or does it determines it itself? Like if it wasn't updated said in a month. Or two months?

This question came to me, thinking many novels are simply left abandoned without warning or saying anything by the author. Is there an option or function that allows the AI to put stories into the hiatus category by simply checking the last chapter's release date? To tie it to a defined time-intervallum? Like, if a story is not updated in 2 months, it goes into the "on-hiatus" section?

Or is this an undesired extra and/or too much extra work to implement anyway? Just thinking out loud here.

WinterTimeCrime · Feb 23, 2023

The first book recommendation is always excellent, like the ones I wish I'd actually see more of when browsing. The ones that follow are

, but that may be a personal opinion.

UnknownNovelist · Feb 23, 2023

Lire said:
I tried it out, and it seems pretty neat and easy to use.

Problem? I am seeing a surprising amount of recommendations that are... well, on HIATUS.

Still, this is pretty neat since I found some interesting novels because of it. I just have to cope with the fact that they'll be unfinished forever.

Thanks for the feedback. I'm planning to add more filters so you could only get recommendations for ongoing and completed novels. I'm glad you found something to read

Which recommendation system did you use? The general one or the personalized?

Corty said:
I tried it out too. As @Lire said, many of the recommendations showed that they are on hiatus. An option would be greatly preferred to exclude recommendations like those. Other than that, looks cool so far!

Noted. It is on the to do list to add a filter option for status.

Corty said:
Edit:
A question came to mind. It may have been written down already, but I don't remember reading it.

Does it show the said novel is on hiatus, flagged by the author, or does it determines it itself? Like if it wasn't updated said in a month. Or two months?

Corty said:
This question came to me, thinking many novels are simply left abandoned without warning or saying anything by the author. Is there an option or function that allows the AI to put stories into the hiatus category by simply checking the last chapter's release date? To tie it to a defined time-intervallum? Like, if a story is not updated in 2 months, it goes into the "on-hiatus" section?

Or is this an undesired extra and/or too much extra work to implement anyway? Just thinking out loud here.

It should show the status (whether its on hiatus, completed, or ongoing) as well as the last update date. It's in the greyed out text under the title.
Which recommendation engine did you use? The general one or the personalized?

WinterTimeCrime said:
The first book recommendation is always excellent, like the ones I wish I'd actually see more of when browsing. The ones that follow are , but that may be a personal opinion.

Thanks for the feedback. I'm working on making the recommendations more relevant - but i first have to figure out a way to get the dataset without scraping ScribbleHub.
What recommendation system did you use?

Lire · Feb 23, 2023

UnknownNovelist said:
Which recommendation system did you use? The general one or the personalized?

Both.

Corty · Feb 24, 2023

UnknownNovelist said:
Thanks for the feedback. I'm planning to add more filters so you could only get recommendations for ongoing and completed novels. I'm glad you found something to read
Which recommendation system did you use? The general one or the personalized?

General.

Also, let me rephrase my question. The system marks a novel to be on hiatus because the author marked it as such, yes?

Now, my next point was if there is a possibility to include an option that also excludes novels that were not updated in, let us say, two months. Simply because there are authors who do not mark the novels as "on hiatus" but simply leave them be. They would still be shown as ongoing. Yes, one can look at the date of the last released chapter, but we all know people can't really even read the tags for a novel and go complaining in the comments.

UnknownNovelist · Feb 27, 2023

Corty said:
General.

Also, let me rephrase my question. The system marks a novel to be on hiatus because the author marked it as such, yes?

Now, my next point was if there is a possibility to include an option that also excludes novels that were not updated in, let us say, two months. Simply because there are authors who do not mark the novels as "on hiatus" but simply leave them be. They would still be shown as ongoing. Yes, one can look at the date of the last released chapter, but we all know people can't really even read the tags for a novel and go complaining in the comments.

That's a good suggestion. Yes, it's the author who has to mark the novel as on "hiatus". I could "easily" make a feature which automatically marked a novel as being on hiatus if it hasn't been updated in more than X amount of months.

Shianelle · Mar 15, 2023

This is fantastic, and I love it. Thank you. I just found a bunch more novels I hadn't read before that sound right up my alley.

Now Live! ScribbleHub Explorer – An AI-Powered Novel Recommendation Engine: Feedback Needed!

UnknownNovelist

Well-known member

K5Rakitan

Level 34 👪 💍 Pronouns: she/whore ♀

UnknownNovelist

Well-known member

greyblob

b

UnknownNovelist

Well-known member

melchi

What is a custom title?

M.G.Driver

Well-known member

UnknownNovelist

Well-known member

Sabruness

Cultured Yuri Connoisseur

UnknownNovelist

Well-known member

UnknownNovelist

Well-known member

Lire

I Wanna Be, The Very Best. Like No One Ever Was!

Corty

Sneaking in, stealing your socks.

WinterTimeCrime

Aggressive-Loving Snowflake

UnknownNovelist

Well-known member

Lire

I Wanna Be, The Very Best. Like No One Ever Was!

Corty

Sneaking in, stealing your socks.

UnknownNovelist

Well-known member

Shianelle

Active member