Hey there Spanz! Smart.fm is shutting down so I decided to get a move on and get the rest of the… - Feed Post from StonerPenguin to spanz
Hey there Spanz! Smart.fm is shutting down so I decided to get a move on and get the rest of the Core 6000. I'm also trying to get stuff from iKnow, the only problem is that in copying it the example sentences get copied too. In notepad++ is there a command to delete every 4th line or something like that?
posted by StonerPenguin February 10, 2011 at 1:15pm
- And how did you put commas (,) between the kanji and the kana readings without having to do it by hand? ><February 10, 2011 at 2:12pm
- I thought iK & Sm.fm were sharing the same contents.
Are there some differences?
iK doesn't use html tables, so it's not as easy to filter the lines as Sm.fm is.
For now, I've captured the entire iK in pure html format. It's still unfiltered and thus it contains the example sentences. I have all the audios, too. We'll think what to do with all that stuff. Please, don't waste your time capturing iK. Instead, grab Sm.fm's c6000 if you want.
The number of iK's sentences / word is not constant, so deleting lines at constant intervals won't do (at least in the first levels).
Fortunately, inside the html, all fields are labeled so this makes the automated filtering process possible. Unfortunately, filtering using replace alone (even with regexps) would be very hard and long because Notepad++ regexps don't work across several lines. You can convert all the text in a huge single line and process it, but you would have to perform a replace for each word.
You could use a Macro to do the repeated work, but I think the easier solution (waaay easier) is making a filtering program to do all the job. A simple and short script will suffice. Don't worry, I'll do it.
Why do you want to delete every 4th line?
Just in case you've seen another way of filtering, this would be the easiest way to delete every 4th line:
-Put the cursor at the start of the first line.
-Start recording a macro.
-Push down key 3 times.
-Mark the line. (Shift+End)
-Push del 2 times. (One for the line, one for the end-of-line)
-Stop the macro recording.
-Save the macro if you plan to use it with other files. Don't bother to give it a key shortcut.
-Click "Run a Macro Multiple Times" and select your macro.
-Mark "Run until the end of file".
About the commas between Kanji & Kana, please, give me an example with a couple of complete lines.
じゃね。February 11, 2011 at 6:22pm
- Thanks a ton! That (the macro) worked like a charm! Actually with the macro I made it's actually easier for me to get stuff from iKnow than Smart.fm... (And you don't have to make a filter)
When I copy from Smart.fm to Notepad++ the item's kana and kanji are stuck together like this;
"洗剤せんざい detergent, cleanser"
I had to manually separate them >< I like iKnow better because it comes out like this;
Which is easier to process and with my macro for deleting the example sentences it goes super fast =D In fact I'm already done with the first 3000 words (they're private now)
If you've got the Smart.fm format figured out maybe you should do their lists and I'll do iKnow ;)February 12, 2011 at 7:36am
- Haha! Thanks, but I've already made a simple PHP script (27 lines) that filters iK directly from the huge html mess. Please, don't work on iK! All the work is already done!
My results are formatted as I usually do:
As usual, the script also changes [verbal noun] and [adjectival noun] by [suru-verb] and [na-adj.]
I can send you the 60 individual files, or a big one with the 6002 lines.
Yes, there are:
6002 entries in total. (*)
5998 different ones. (K+k+M)
5943 different ones. (K+k)
5882 different ones. (K)
(*) Core3000_10 and Core6000_2 have 101 entries, dunno why.
About Sm.fm, if the fields are stuck together, you can't separate them in an automated way.
This happens when you copy & paste directly from the browser. Blame Sm's programmers!
If you use a table plugin (I use TableTools (T2F3) 0.28 for FireFox), to copy the table as tab delimited text, you get a cleaner result and Kanji & Kana get separated by a space. This allows a much easier process, although the Kana repetition for entries without Kanji is still tricky (I think you can't do it without regular expressions).
Of course, you can also save the pages as html files and make a true filter, like I did for iK. It's really simpler!
If you don't want to copy the stuff from Sm, I can do it for you, but my question is still unanswered:
Is it worth? That is, are iK and Sm really different?
P.S. Have you made a 3000 words Note?
Good work! But remember that right now, long Notes crash Flash Player. Beeant wants to change the player but until the change is done, you shouldn't make very long Notes.
February 12, 2011 at 8:42am
- Ack! You already got it done? When I said I got 3000 words done it wasn't from a single note-- I got the first 30 steps from iKnow done! And I've already did most iKnow so I might as well finish >< Sorry, I just noticed that the step were organized differently because my info from my smart.fm account was transferred to iKnow but it said I wasn't done with their goals because some of the words had changed. So I just wanted to get 'em myself for study when I changed over to iKnow. I can keep my iKnow notes private though since there's nearly 60 of them now D: Sorry for the redundancy :(February 12, 2011 at 9:55am
- You can finish the iK work on your own, if you want. You're a very hard worker, Muscle Man!
Are you sure you're only twee years old? =D
I've compared my own big Note against the first 20 steps of iK, and this is what I've found:
-The order has changed.
-Most entries are the same, although many of them have been edited in some way (slightly corrected translations, changes in some POS).
-There are new entries (~10%), and of course, there is the same amount of lost entries.
I'm pretty sure the lost entries are somewhere in other Steps, due to the changed order.
True new / lost entries are very possible, but I guess there won't be many of them. If the list is based in the frequency of use, global changes are very improbable.
All in all, I don't feel the need of saving Sm at all, because I think iK is basically an improved / polished version of Sm.
Do you have a reason to save all the Sm's cores?
Tell me if you want my files and the meaning and POS format you want (commas, slashes, POS position...).
バイバイ！February 12, 2011 at 6:18pm
- Hi there, be careful before you save the page in HTLM format, to bring the page at the maximum size or you will get just the first 20 lines/examples.
I am trying to get the audios in MP3 format and it`s taking like forever, do you have it in MP3 format???February 12, 2011 at 10:39pm
- Hi, Ela. Yes, I did all the expansions before saving them. I've already extracted all the words (6002) from the 60 html files so I'm sure all is ok.
And yes, I've got all the 12050 .mp3 (words and also examples)
Just a little detail:
All the sound links in the .html files saved still point to the original server.
If you want the local .html sounds operative, all the links must be edited somehow. The original page seems to use Flash, but I don't know how to adapt the links. I guess it must be easy, but I don't know html/js.February 13, 2011 at 1:53am
- Even if will take a lot of time, saving audio files one by one seems to be the only way. Too bad is no other easy way to do it...
Thanks a lot anyway.^-^VFebruary 13, 2011 at 11:12am
- What? One by one? NO!!!
I saved the 60 lessons (including the 12050 mp3 files) in 60 steps, only one for each .htm!
3-Save as .htm
The last operation downloads many files already saved when you save as .htm, (mainly .jpg thumbnails), but it downloads also all the .mp3 (about 200 mp3/lesson) in a single, yet slow, operation. The repeated files are not overwritten, but named with a (2) mark, so you can delete them easily afterwards.
This is not a fast procedure, but it's a lot better than doing it one by one.
At the end, you should have ~5MB of .mp3 (~200 small files) for each lesson. About 300MB (12050 files) in total.February 13, 2011 at 5:48pm
- Thank you very much, I dont use Mozzila Firefox and I had no idea about FlashGot.
I had to search on internet to find out about FlashGot then I realize that it`s a Firefox application/extension so I installed both and now I can get audios much faster.
Mil veces gracias!
Feliz día de los enamorados!February 14, 2011 at 1:23pm
- You're welcome!
Remember you have to save the .mp3 in their own folder (one lesson, one folder = 60 folders), because sometimes there are different files across lessons with the same name.
I know it because I tried to copy all of them together, and I got overwriting warnings for files with different size.
¡Feliz San Valentín para tí también!February 15, 2011 at 2:40am